Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks
Andreas Happe, J\"urgen Cito

TL;DR
This paper demonstrates that autonomous LLM-driven systems can effectively perform penetration testing on enterprise Active Directory networks, reducing costs and expanding access to cybersecurity assessments.
Contribution
It introduces the first fully autonomous LLM-based framework for Assumed Breach penetration testing on real enterprise networks, showcasing its capabilities and limitations.
Findings
Autonomous LLMs can adapt attack strategies dynamically.
They can perform multi-vector attacks including social engineering.
Costs are lower than traditional human pen-testers.
Abstract
Enterprise penetration-testing is often limited by high operational costs and the scarcity of human expertise. This paper investigates the feasibility and effectiveness of using Large Language Model (LLM)-driven autonomous systems to address these challenges in real-world Active Directory (AD) enterprise networks. We introduce a novel prototype designed to employ LLMs to autonomously perform Assumed Breach penetration-testing against enterprise networks. Our system represents the first demonstration of a fully autonomous, LLM-driven framework capable of compromising accounts within a real-life Microsoft Active Directory testbed, GOAD. We perform our empirical evaluation using five LLMs, comparing reasoning to non-reasoning models as well as including open-weight models. Through quantitative and qualitative analysis, incorporating insights from cybersecurity experts, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Service-Oriented Architecture and Web Services
