On the Surprising Efficacy of LLMs for Penetration-Testing

Andreas Happe; J\"urgen Cito

arXiv:2507.00829·cs.CR·July 2, 2025

On the Surprising Efficacy of LLMs for Penetration-Testing

Andreas Happe, J\"urgen Cito

PDF

Open Access

TL;DR

This paper critically examines the unexpected effectiveness of Large Language Models in penetration testing, highlighting their capabilities, dual-use challenges, and obstacles to wider adoption in cybersecurity.

Contribution

It provides a comprehensive review of LLMs' application in penetration testing, analyzing their strengths, risks, and the barriers to safe and effective deployment.

Findings

01

LLMs excel at pattern-matching in penetration testing tasks.

02

They manage uncertainty effectively in dynamic environments.

03

Adoption faces challenges like reliability, safety, costs, and ethical concerns.

Abstract

This paper presents a critical examination of the surprising efficacy of Large Language Models (LLMs) in penetration testing. The paper thoroughly reviews the evolution of LLMs and their rapidly expanding capabilities which render them increasingly suitable for complex penetration testing operations. It systematically details the historical adoption of LLMs in both academic research and industry, showcasing their application across various offensive security tasks and covering broader phases of the cyber kill chain. Crucially, the analysis also extends to the observed adoption of LLMs by malicious actors, underscoring the inherent dual-use challenge of this technology within the security landscape. The unexpected effectiveness of LLMs in this context is elucidated by several key factors: the strong alignment between penetration testing's reliance on pattern-matching and LLMs' core…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Network Security and Intrusion Detection · Information and Cyber Security