Incalmo: An Autonomous LLM-assisted System for Red Teaming Multi-Host Networks
Brian Singer, Keane Lucas, Lakshmi Adiga, Meghna Jain, Lujo Bauer, Vyas Sekar

TL;DR
Incalmo is an autonomous LLM-assisted system designed for red teaming multi-host networks, significantly outperforming existing systems in a realistic benchmark by efficiently identifying critical assets within a short time frame.
Contribution
This paper introduces Incalmo, a novel system that improves LLM-assisted red teaming by using high-level planning and domain-specific agents for multi-host network attacks.
Findings
Incalmo successfully attacked 37 out of 40 networks in MHBench.
State-of-the-art systems succeeded in only 3 out of 40 networks.
Incalmo's attacks took 12-54 minutes and cost less than $15 in LLM credits.
Abstract
Security operators use red teams to simulate real attackers and proactively find defense gaps. In realistic enterprise settings, this involves executing multi-host network attacks spanning many "stepping stone" hosts. Unfortunately, red teams are expensive and entail significant expertise and effort. Given the promise of LLMs in CTF challenges, we first analyze if LLMs can autonomously execute multi-host red team exercises. We find that state-of-the-art LLM-assisted offense systems (e.g., PentestGPT, CyberSecEval3) with leading LLMs (e.g., Sonnet 4, Gemini 2.5 Pro) are unable to do so. Building on our observations in understanding the failure modes of state-of-the-art systems, we argue the need to improve the abstractions and interfaces for LLM-assisted red teaming. Based on this insight, we present the design and implementation of Incalmo, an LLM-assisted system for autonomously red…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Smart Grid Security and Resilience · Software-Defined Networks and 5G
