LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, Jo\~ao Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi

TL;DR
This paper introduces LLMail-Inject, a comprehensive dataset from a realistic adaptive prompt injection challenge targeting LLM-based email assistants, highlighting vulnerabilities and aiding future defenses.
Contribution
It provides the first large-scale dataset and analysis from a realistic adaptive prompt injection challenge, facilitating research on instruction-data separation in LLMs.
Findings
208,095 attack submissions analyzed
Identified vulnerabilities across multiple LLM architectures
Dataset enables new insights into prompt injection defenses
Abstract
Indirect Prompt Injection attacks exploit the inherent limitation of Large Language Models (LLMs) to distinguish between instructions and data in their inputs. Despite numerous defense proposals, the systematic evaluation against adaptive adversaries remains limited, even when successful attacks can have wide security and privacy implications, and many real-world LLM-based applications remain vulnerable. We present the results of LLMail-Inject, a public challenge simulating a realistic scenario in which participants adaptively attempted to inject malicious instructions into emails in order to trigger unauthorized tool calls in an LLM-based email assistant. The challenge spanned multiple defense strategies, LLM architectures, and retrieval configurations, resulting in a dataset of 208,095 unique attack submissions from 839 participants. We release the challenge code, the full dataset of…
Peer Reviews
Decision·Submitted to ICLR 2026
1. All attacks were human-generated by participants attempting to solve real challenges, avoiding the template-based limitations of existing datasets. 2. The competition-based approach successfully gathered over 200k unique attack prompts with rich diversity in attack strategies, representing an unprecedented scale compared to existing benchmarks. 3. The paper is well-structured with overall good visualizations (except Figure 3). The systematic comparison of defenses across multiple dimensions
1. **Representativeness of the Scenario:** This paper focuses on email agents as the attack scenario. While email processing is a common use case, it may not capture the full diversity of real-world applications where prompt injection attacks can occur, such as web search, coding assistants, or customer support bots. The authors are encouraged to discuss the generalizability of their findings beyond email agents and discuss whether the dataset can be adapted to other scenarios. 2. **LLM Selectio
1. The proposed dataset is collected from a large-scale, real-world competition, which makes the collected data highly diverse and realistic, providing valuable resources and insights for future research on LLM safety and prompt injection defenses. 2. The attack strategies are contextually relevant and reflect how adaptive prompt injection attacks may occur in practical LLM applications, such as email assistants. 3. The paper provides comprehensive analyses across multiple difficulty levels an
1. The paper could include more recent and stronger baselines for comparison, such as StruQ [1], SecAlign [2], and Meta-SecAlign [3], which represent the state-of-the-art fine-tuning-based defenses against prompt injection. 2. The proposed benchmark focuses solely on the email scenario, which, while realistic, may limit the generalizability of the findings. It would be valuable to include other application contexts, such as document editing, coding, or web agents. 3. Although the dataset captu
1. The paper devotes significant efforts on building the community of prompt injection, the top-1 threat to LLM-integrated applications. With a complex competition design, the competition collected a very large human-generated high-quality prompt injection dataset, which would be a great asset for future assessment of the model, given that current attack benchmarks are saturating. 2. The competition is built on a practical attack scenario, email assistant, where an LLM is very suitable for handl
1. The competition assumes that the attacker knows the attack target string (trigger the model’s send_email tool call with arguments: [email protected], content=confirmation). However, in a practical attack scenario, how does the attacker know the name/parameters of a function call that will lead to malicious actions? That information is generally kept private in the LLM system. 2. The selected two victim models are not strong nor representative enough. Phi is a 14B small model wi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
