Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
Valerii Gakh, Hayretdin Bahsi

TL;DR
This paper evaluates the effectiveness of early prompt injection detection systems in LLM applications, comparing open-source solutions and proposing improvements to enhance security against prompt leak attacks.
Contribution
It provides a comprehensive analysis and comparison of existing prompt leak detection techniques, identifying their strengths, weaknesses, and proposing specific improvements.
Findings
Vigil is best for minimal false positives.
Rebuff performs well for average detection needs.
Canary word checks in Vigil and Rebuff are ineffective against prompt leaks.
Abstract
Prompt injection threatens novel applications that emerge from adapting LLMs for various user tasks. The newly developed LLM-based software applications become more ubiquitous and diverse. However, the threat of prompt injection attacks undermines the security of these systems as the mitigation and defenses against them, proposed so far, are insufficient. We investigated the capabilities of early prompt injection detection systems, focusing specifically on the detection performance of techniques implemented in various open-source solutions. These solutions are supposed to detect certain types of prompt injection attacks, including the prompt leak. In prompt leakage attacks, an attacker maliciously manipulates the LLM into outputting its system instructions, violating the system's confidentiality. Our study presents analyzes of distinct prompt leakage detection techniques, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques
