How Do Prompt Injection Scanners Perform? A Benchmark.
First of all, we are proud and honoured to share that we have received a Google Patch Reward for our contributions to securing open-source LLMs. This recognition further solidifies the trust we’ve built with the open-source community and the subsidy will greatly help us improve our library in securing LLMs in production at the enterprise.
When we released our first version of LLM Guard, we were happy to see that there were several existing OS contributions (i.e. Deepset) that we could tap into and embed into our library to secure enterprise LLM applications against prompt injections. However, as the space has been moving at lightning speed we also saw novel attack techniques and research emerge that without a continued effort to keep existing OS prompt injection detection models up to date would break any LLM in the market. The only continued effort came from closed source players, of which we saw 1-2 emerging every single week over the past 2 months, that are less accessible to developers looking for an OS-viable alternative. Besides that, we always had the opinion that prompt injections would almost be like chasing your tail where any moat achieved in your detection capabilities is short-lived - seemingly reminiscent of the early malware days. This is especially true for those OS/closed source players that rely on a vector database they need to keep up to date for known prompt injections. While we have decided to take a different approach, training our proprietary models based on known attacks that can discern known patterns in unknown attacks, the best approach is still to be determined.
Therefore, we decided to train our proprietary prompt injection model and open-source it to the community. We are also happy to share that our model is greatly outperforming previous state-of-the-art alternatives in the market.
We have previously also been compared against other players or alternatives in the market (i.e. Rebuff, Lakera, etc.). For that reason, we decided to create a benchmark that allows anyone to test our prompt injection detection model against existing alternatives in the market. We believe there is a strong need in the market to create transparency among alternatives to allow for developers to make an easier decision based on each player’s capabilities. Based on the countless conversations we had with many companies and developers, the benchmark currently tests each model against latency and accuracy for any prompt input. We also store deviations in a dataset to contribute our learnings further to the broader community in a separate article! Besides that, we do want to note that the performance of some of the models may be faster yet less accurate due to their architecture (i.e. Distelbert) or may have caching (this is an assumption). Moreover, some models may be faster yet far more expensive based on the instance they run on (i.e. CPU vs. GPU).
Below are some example tests we ran against some of the other known players out there. Do note that these are examples we chose from a random dataset. Feel free to try your own or pick any of the examples we shared.
Example 1: Safe Prompt
Example 2: Prompt Injection
If you want your prompt injection detection scanner integrated into our benchmark, give us a ping over at hello@laiyer.ai.
Stay tuned for our next article where we share our learnings on building the prompt injection detection model!