Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

Safwan Shaheer; G.M. Refatul Islam; Mohammad Rafid Hamid; Tahsin Zaman Jilan

arXiv:2512.16307·cs.CR·December 19, 2025

Beyond the Benchmark: Innovative Defenses Against Prompt Injection Attacks

Safwan Shaheer, G.M. Refatul Islam, Mohammad Rafid Hamid, Tahsin Zaman Jilan

PDF

Open Access

TL;DR

This paper introduces innovative, automated defense mechanisms against prompt injection attacks in small open-source LLMs, demonstrating significant improvements in security and attack mitigation through a new iterative framework.

Contribution

It presents a novel defense framework using seed defenses and iterative refinement, specifically targeting prompt injection vulnerabilities in LLaMA models.

Findings

01

Defense strategies significantly reduce attack success rates.

02

The approach improves detection of goal-hijacking attacks.

03

Enhanced defenses lower false positive detection rates.

Abstract

In this fast-evolving area of LLMs, our paper discusses the significant security risk presented by prompt injection attacks. It focuses on small open-sourced models, specifically the LLaMA family of models. We introduce novel defense mechanisms capable of generating automatic defenses and systematically evaluate said generated defenses against a comprehensive set of benchmarked attacks. Thus, we empirically demonstrated the improvement proposed by our approach in mitigating goal-hijacking vulnerabilities in LLMs. Our work recognizes the increasing relevance of small open-sourced LLMs and their potential for broad deployments on edge devices, aligning with future trends in LLM applications. We contribute to the greater ecosystem of open-source LLMs and their security in the following: (1) assessing present prompt-based defenses against the latest attacks, (2) introducing a new framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Web Application Security Vulnerabilities · Information and Cyber Security