Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

Andreas Happe; J\"urgen Cito

arXiv:2603.01789·cs.CR·March 3, 2026

Can LLMs Hack Enterprise Networks? -- Replicated Computational Results (RCR) Report

Andreas Happe, J\"urgen Cito

PDF

Open Access

TL;DR

This paper empirically evaluates the ability of various large language models to perform penetration testing on enterprise networks, specifically Microsoft Active Directory simulations, providing reproducible artifacts and analysis tools.

Contribution

It offers a replicated computational framework for assessing LLMs in enterprise network hacking scenarios, enhancing transparency and reproducibility.

Findings

01

LLMs can simulate penetration testing with varying success rates

02

The report provides detailed artifacts and scripts for replication

03

Insights into LLM capabilities in cybersecurity contexts

Abstract

This is the Replicated Computational Results (RCR) Report for the paper ``Can LLMs Hack Enterprise Networks?" The paper empirically investigates the efficacy and effectiveness of different LLMs for penetration-testing enterprise networks, i.e., Microsoft Active Directory Assumed-Breach Simulations. This RCR report describes the artifacts used in the paper, how to create an evaluation setup, and highlights the analysis scripts provided within our prototype.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Software-Defined Networks and 5G · Information and Cyber Security