LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Sahar Abdelnabi; Aideen Fay; Ahmed Salem; Egor Zverev; Kai-Chieh Liao; Chi-Huang Liu; Chun-Chih Kuo; Jannis Weigend; Danyael Manlangit; Alex Apostolov; Haris Umair; Jo\~ao Donato; Masayuki Kawakita; Athar Mahboob; Tran Huu Bach; Tsun-Han Chiang; Myeongjin Cho; Hajin Choi; Byeonghyeon Kim; Hyeonjin Lee; Benjamin Pannell; Conor McCauley; Mark Russinovich; Andrew Paverd; Giovanni Cherubin

arXiv:2506.09956·cs.CR·June 12, 2025

LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge

Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, Jo\~ao Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper introduces LLMail-Inject, a comprehensive dataset from a realistic adaptive prompt injection challenge targeting LLM-based email assistants, highlighting vulnerabilities and aiding future defenses.

Contribution

It provides the first large-scale dataset and analysis from a realistic adaptive prompt injection challenge, facilitating research on instruction-data separation in LLMs.

Findings

01

208,095 attack submissions analyzed

02

Identified vulnerabilities across multiple LLM architectures

03

Dataset enables new insights into prompt injection defenses

Abstract

Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs. Despite numerous defense proposals, the systematic evaluation against adaptive adversaries remains limited, even when successful attacks can have wide security and privacy implications, and many real-world LLM-based applications remain vulnerable. We present the results of LLMail-Inject, a public challenge simulating a realistic scenario in which participants adaptively attempted to inject malicious instructions into emails in order to trigger unauthorized tool calls in an LLM-based email assistant. The challenge spanned multiple defense strategies, LLM architectures, and retrieval configurations, resulting in a dataset of 208,095 unique attack submissions from 839 participants. We release the challenge code, the full dataset of…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. All attacks were human-generated by participants attempting to solve real challenges, avoiding the template-based limitations of existing datasets. 2. The competition-based approach successfully gathered over 200k unique attack prompts with rich diversity in attack strategies, representing an unprecedented scale compared to existing benchmarks. 3. The paper is well-structured with overall good visualizations (except Figure 3). The systematic comparison of defenses across multiple dimensions

Weaknesses

1. **Representativeness of the Scenario:** This paper focuses on email agents as the attack scenario. While email processing is a common use case, it may not capture the full diversity of real-world applications where prompt injection attacks can occur, such as web search, coding assistants, or customer support bots. The authors are encouraged to discuss the generalizability of their findings beyond email agents and discuss whether the dataset can be adapted to other scenarios. 2. **LLM Selectio

Reviewer 02Rating 6Confidence 4

Strengths

1. The proposed dataset is collected from a large-scale, real-world competition, which makes the collected data highly diverse and realistic, providing valuable resources and insights for future research on LLM safety and prompt injection defenses. 2. The attack strategies are contextually relevant and reflect how adaptive prompt injection attacks may occur in practical LLM applications, such as email assistants. 3. The paper provides comprehensive analyses across multiple difficulty levels an

Weaknesses

1. The paper could include more recent and stronger baselines for comparison, such as StruQ [1], SecAlign [2], and Meta-SecAlign [3], which represent the state-of-the-art fine-tuning-based defenses against prompt injection. 2. The proposed benchmark focuses solely on the email scenario, which, while realistic, may limit the generalizability of the findings. It would be valuable to include other application contexts, such as document editing, coding, or web agents. 3. Although the dataset captu

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper devotes significant efforts on building the community of prompt injection, the top-1 threat to LLM-integrated applications. With a complex competition design, the competition collected a very large human-generated high-quality prompt injection dataset, which would be a great asset for future assessment of the model, given that current attack benchmarks are saturating. 2. The competition is built on a practical attack scenario, email assistant, where an LLM is very suitable for handl

Weaknesses

1. The competition assumes that the attacker knows the attack target string (trigger the model’s send_email tool call with arguments: [email protected], content=confirmation). However, in a practical attack scenario, how does the attacker know the name/parameters of a function call that will lead to malicious actions? That information is generally kept private in the LLM system. 2. The selected two victim models are not strong nor representative enough. Phi is a 14B small model wi

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques