AttackEval: A Systematic Empirical Study of Prompt Injection Attack Effectiveness Against Large Language Models
Jackson Wang

TL;DR
This paper systematically evaluates prompt injection attack strategies against large language models, revealing vulnerabilities and guiding more robust defense designs.
Contribution
It introduces AttackEval, a comprehensive taxonomy and empirical framework for analyzing prompt injection attack effectiveness and defense resilience.
Findings
Obfuscation achieves the highest attack success rate (ASR=0.76) against defenses.
Semantic/Social attacks maintain high ASR (0.44-0.48) against intent-aware defenses.
Combining attack strategies dramatically increases ASR, e.g., OBF + EM reaches 97.6%.
Abstract
Prompt injection has emerged as a critical vulnerability in large language model (LLM) deployments, yet existing research is heavily weighted toward defenses. The attack side -- specifically, which injection strategies are most effective and why -- remains insufficiently studied.We address this gap with AttackEval, a systematic empirical study of prompt injection attack effectiveness. We construct a taxonomy of ten attack categories organized into three parent groups (Syntactic, Contextual, and Semantic/Social), populate each category with 25 carefully crafted prompts (250 total), and evaluate them against a simulated production victim system under four progressively stronger defense tiers. Experiments reveal several non-obvious findings: (1) Obfuscation (OBF) achieves the highest single-attack success rate (ASR = 0.76) against even intent-aware defenses, because it defeats both keyword…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
