Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks
Safwan Shaheer, G.M. Refatul Islam, Mohammad Rafid Hamid, Tahsin Zaman Jilan

TL;DR
This paper introduces innovative, automated defense mechanisms against prompt injection attacks in small open-source LLMs, demonstrating significant improvements in security and attack mitigation through a new iterative framework.
Contribution
It presents a novel defense framework using seed defenses and iterative refinement, specifically targeting prompt injection vulnerabilities in LLaMA models.
Findings
Defense strategies significantly reduce attack success rates.
The approach improves detection of goal-hijacking attacks.
Enhanced defenses lower false positive detection rates.
Abstract
In this fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of models. We introduce novel defense mechanisms capable of generating automatic defenses and systematically evaluate said generated defenses against a comprehensive set of benchmarked attacks. Thus, we empirically demonstrated the improvement proposed by our approach in mitigating goal-hijacking vulnerabilities in LLMs. Our work recognizes the increasing relevance of small open-sourced LLMs and their potential for broad deployments on edge devices, aligning with future trends in LLM applications. We contribute to the greater ecosystem of open-source LLMs and their security in the following: (1) assessing present prompt-based defenses against the latest attacks, (2) introducing a new framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Web Application Security Vulnerabilities · Information and Cyber Security
