PIArena: A Platform for Prompt Injection Evaluation

Runpeng Geng; Chenlong Yin; Yanting Wang; Ying Chen; Jinyuan Jia

arXiv:2604.08499·cs.CR·April 10, 2026

PIArena: A Platform for Prompt Injection Evaluation

Runpeng Geng, Chenlong Yin, Yanting Wang, Ying Chen, Jinyuan Jia

PDF

1 Repo

TL;DR

PIArena is a comprehensive platform designed to evaluate prompt injection attacks and defenses, revealing their limitations and aiding in developing more robust solutions.

Contribution

The paper introduces PIArena, a unified, extensible platform for prompt injection evaluation, including adaptive attack strategies and extensive benchmarking capabilities.

Findings

01

State-of-the-art defenses show limited generalizability.

02

Adaptive attacks expose vulnerabilities in current defenses.

03

Injected task alignment poses fundamental challenges.

Abstract

Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the community faces a critical gap: the lack of a unified platform for prompt injection evaluation. This makes it challenging to reliably compare defenses, understand their true robustness under diverse attacks, or assess how well they generalize across tasks and benchmarks. For instance, many defenses initially reported as effective were later found to exhibit limited robustness on diverse datasets and attacks. To bridge this gap, we introduce PIArena, a unified and extensible platform for prompt injection evaluation that enables users to easily integrate state-of-the-art attacks and defenses and evaluate them across a variety of existing and new benchmarks. We also design a dynamic strategy-based attack that adaptively optimizes injected prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sleeepeer/PIArena
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.