Evaluating the efficacy of LLM Safety Solutions : The Palit Benchmark Dataset
Sayon Palit, Daniel Woods

TL;DR
This paper presents a benchmark dataset and comparative analysis of security tools for evaluating Large Language Model safety, highlighting current limitations and proposing improvements for better detection of malicious prompts.
Contribution
The study introduces the Palit Benchmark Dataset for evaluating LLM safety tools and provides a comprehensive comparison of 13 solutions, identifying top performers and areas for improvement.
Findings
Lakera Guard and ProtectAI LLM Guard are the top tools.
Baseline ChatGPT-3.5-Turbo has too many false positives.
Recommendations include increased transparency and better metrics.
Abstract
Large Language Models (LLMs) are increasingly integrated into critical systems in industries like healthcare and finance. Users can often submit queries to LLM-enabled chatbots, some of which can enrich responses with information retrieved from internal databases storing sensitive data. This gives rise to a range of attacks in which a user submits a malicious query and the LLM-system outputs a response that creates harm to the owner, such as leaking internal data or creating legal liability by harming a third-party. While security tools are being developed to counter these threats, there is little formal evaluation of their effectiveness and usability. This study addresses this gap by conducting a thorough comparative analysis of LLM security tools. We identified 13 solutions (9 closed-source, 4 open-source), but only 7 were evaluated due to a lack of participation by proprietary model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · AI in Service Interactions · Ethics and Social Impacts of AI
