Evaluating Generalization Mechanisms in Autonomous Cyber Attack Agents

Ond\v{r}ej Luk\'a\v{s}; Jihoon Shin; Emilia Rivas; Diego Forni; Maria Rigaki; Carlos Catania; Aritran Piplai; Christopher Kiekintveld; Sebastian Garcia

arXiv:2603.10041·cs.CR·March 12, 2026

Evaluating Generalization Mechanisms in Autonomous Cyber Attack Agents

Ond\v{r}ej Luk\'a\v{s}, Jihoon Shin, Emilia Rivas, Diego Forni, Maria Rigaki, Carlos Catania, Aritran Piplai, Christopher Kiekintveld, Sebastian Garcia

PDF

Open Access

TL;DR

This paper evaluates how different autonomous cyber attack agents generalize to unseen network configurations, revealing that address-space changes can significantly impair policy transfer, with LLM-based agents performing best but at higher computational and transparency costs.

Contribution

It introduces a minimal shift in network scenarios to test agent generalization and compares traditional, adaptation, and LLM-based agents under this setting.

Findings

01

Adaptation methods show partial transfer but degrade under unseen IP reassignments.

02

Prompt-driven LLM agents achieve highest success on unseen scenarios.

03

LLM agents have higher inference costs and lower transparency.

Abstract

Autonomous offensive agents often fail to transfer beyond the networks on which they are trained. We isolate a minimal but fundamental shift -- unseen host/subnet IP reassignment in an otherwise fixed enterprise scenario -- and evaluate attacker generalization in the NetSecGame environment. Agents are trained on five IP-range variants and tested on a sixth unseen variant; only the meta-learning agent may adapt at test time. We compare three agent families (traditional RL, adaptation agents, and LLM-based agents) and use action-distribution-based behavioral/XAI analyses to localize failure modes. Some adaptation methods show partial transfer but significant degradation under unseen reassignment, indicating that even address-space changes can break long-horizon attack policies. Under our evaluation protocol and agent-specific assumptions, prompt-driven pretrained LLM agents achieve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Information and Cyber Security · Adversarial Robustness in Machine Learning