A Critical Evaluation of Defenses against Prompt Injection Attacks

Yuqi Jia; Zedian Shao; Yupei Liu; Jinyuan Jia; Dawn Song; Neil Zhenqiang Gong

arXiv:2505.18333·cs.CR·May 27, 2025

A Critical Evaluation of Defenses against Prompt Injection Attacks

Yuqi Jia, Zedian Shao, Yupei Liu, Jinyuan Jia, Dawn Song, Neil Zhenqiang Gong

PDF

Open Access 1 Repo

TL;DR

This paper critically evaluates existing defenses against prompt injection attacks on LLMs, revealing that many are less effective than previously claimed when assessed with a comprehensive, principled methodology.

Contribution

It introduces a rigorous evaluation framework for defenses against prompt injection, highlighting gaps in prior assessments and guiding future defense development.

Findings

01

Existing defenses are less effective than previously claimed.

02

Many defenses compromise LLM utility under evaluation.

03

The paper proposes a comprehensive evaluation methodology.

Abstract

Large Language Models (LLMs) are vulnerable to prompt injection attacks, and several defenses have recently been proposed, often claiming to mitigate these attacks successfully. However, we argue that existing studies lack a principled approach to evaluating these defenses. In this paper, we argue the need to assess defenses across two critical dimensions: (1) effectiveness, measured against both existing and adaptive prompt injection attacks involving diverse target and injected prompts, and (2) general-purpose utility, ensuring that the defense does not compromise the foundational capabilities of the LLM. Our critical evaluation reveals that prior studies have not followed such a comprehensive evaluation methodology. When assessed using this principled approach, we show that existing defenses are not as successful as previously reported. This work provides a foundation for evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pieval123/pieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques