TL;DR
PIArena is a comprehensive platform designed to evaluate prompt injection attacks and defenses, revealing their limitations and aiding in developing more robust solutions.
Contribution
The paper introduces PIArena, a unified, extensible platform for prompt injection evaluation, including adaptive attack strategies and extensive benchmarking capabilities.
Findings
State-of-the-art defenses show limited generalizability.
Adaptive attacks expose vulnerabilities in current defenses.
Injected task alignment poses fundamental challenges.
Abstract
Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the community faces a critical gap: the lack of a unified platform for prompt injection evaluation. This makes it challenging to reliably compare defenses, understand their true robustness under diverse attacks, or assess how well they generalize across tasks and benchmarks. For instance, many defenses initially reported as effective were later found to exhibit limited robustness on diverse datasets and attacks. To bridge this gap, we introduce PIArena, a unified and extensible platform for prompt injection evaluation that enables users to easily integrate state-of-the-art attacks and defenses and evaluate them across a variety of existing and new benchmarks. We also design a dynamic strategy-based attack that adaptively optimizes injected prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
