Incalmo: An Autonomous LLM-assisted System for Red Teaming Multi-Host Networks

Brian Singer; Keane Lucas; Lakshmi Adiga; Meghna Jain; Lujo Bauer; Vyas Sekar

arXiv:2501.16466·cs.CR·November 25, 2025

Incalmo: An Autonomous LLM-assisted System for Red Teaming Multi-Host Networks

Brian Singer, Keane Lucas, Lakshmi Adiga, Meghna Jain, Lujo Bauer, Vyas Sekar

PDF

Open Access 1 Repo

TL;DR

Incalmo is an autonomous LLM-assisted system designed for red teaming multi-host networks, significantly outperforming existing systems in a realistic benchmark by efficiently identifying critical assets within a short time frame.

Contribution

This paper introduces Incalmo, a novel system that improves LLM-assisted red teaming by using high-level planning and domain-specific agents for multi-host network attacks.

Findings

01

Incalmo successfully attacked 37 out of 40 networks in MHBench.

02

State-of-the-art systems succeeded in only 3 out of 40 networks.

03

Incalmo's attacks took 12-54 minutes and cost less than $15 in LLM credits.

Abstract

Security operators use red teams to simulate real attackers and proactively find defense gaps. In realistic enterprise settings, this involves executing multi-host network attacks spanning many "stepping stone" hosts. Unfortunately, red teams are expensive and entail significant expertise and effort. Given the promise of LLMs in CTF challenges, we first analyze if LLMs can autonomously execute multi-host red team exercises. We find that state-of-the-art LLM-assisted offense systems (e.g., PentestGPT, CyberSecEval3) with leading LLMs (e.g., Sonnet 4, Gemini 2.5 Pro) are unable to do so. Building on our observations in understanding the failure modes of state-of-the-art systems, we argue the need to improve the abstractions and interfaces for LLM-assisted red teaming. Based on this insight, we present the design and implementation of Incalmo, an LLM-assisted system for autonomously red…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bsinger98/Incalmo
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Smart Grid Security and Resilience · Software-Defined Networks and 5G