EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
Talor Abramovich, Meet Udeshi, Minghao Shao, Kilian Lieret, Haoran Xi, Kimberly Milner, Sofija Jancheska, John Yang, Carlos E. Jimenez, Farshad Khorrami, Prashanth Krishnamurthy, Brendan Dolan-Gavitt, Muhammad Shafique, Karthik Narasimhan, Ramesh Karri, Ofir Press

TL;DR
EnIGMA introduces interactive tools for LM agents, significantly enhancing their ability to identify and exploit security vulnerabilities in CTF challenges, achieving state-of-the-art performance across multiple benchmarks.
Contribution
The paper presents novel interactive tools and interfaces that enable LM agents to run utilities like debuggers, improving cybersecurity challenge-solving capabilities.
Findings
Achieved state-of-the-art results on NYU CTF, Intercode-CTF, and CyBench.
Demonstrated substantial performance improvements with new tools.
Developed methods to quantify data leakage and identified soliloquizing phenomenon.
Abstract
Although language model (LM) agents have demonstrated increased performance in multiple domains, including coding and web-browsing, their success in cybersecurity has been limited. We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges. We introduce new tools and interfaces to improve the agent's ability to find and exploit security vulnerabilities, focusing on interactive terminal programs. These novel Interactive Agent Tools enable LM agents, for the first time, to run interactive utilities, such as a debugger and a server connection tool, which are essential for solving these challenges. Empirical analysis on 390 CTF challenges across four benchmarks demonstrate that these new tools and interfaces substantially improve our agent's performance, achieving state-of-the-art results on NYU CTF, Intercode-CTF, and CyBench. Finally, we analyze data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSimulation Techniques and Applications · Modeling, Simulation, and Optimization
MethodsSparse Evolutionary Training
