AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents

Julius Henke

arXiv:2505.10321·cs.CR·May 16, 2025

AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents

Julius Henke

PDF

Open Access 1 Repo

TL;DR

AutoPentest leverages autonomous GPT-4-based LLM agents to perform black-box penetration testing, demonstrating comparable effectiveness to manual methods and highlighting potential for cost-effective vulnerability management.

Contribution

The paper introduces AutoPentest, an autonomous LLM-powered tool for penetration testing, showcasing its capabilities and cost analysis compared to manual approaches.

Findings

01

AutoPentest completes 15-25% of subtasks on HTB machines, slightly outperforming ChatGPT.

02

AutoPentest's total cost is $96.20, higher than ChatGPT Plus subscription.

03

Further improvements and newer LLMs could make autonomous penetration testing more viable.

Abstract

A recent area of increasing research is the use of Large Language Models (LLMs) in penetration testing, which promises to reduce costs and thus allow for higher frequency. We conduct a review of related work, identifying best practices and common evaluation issues. We then present AutoPentest, an application for performing black-box penetration tests with a high degree of autonomy. AutoPentest is based on the LLM GPT-4o from OpenAI and the LLM agent framework LangChain. It can perform complex multi-step tasks, augmented by external tools and knowledge bases. We conduct a study on three capture-the-flag style Hack The Box (HTB) machines, comparing our implementation AutoPentest with the baseline approach of manually using the ChatGPT-4o user interface. Both approaches are able to complete 15-25 % of the subtasks on the HTB machines, with AutoPentest slightly outperforming ChatGPT. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

juliushenke/autopentest
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Web Application Security Vulnerabilities · Advanced Malware Detection Techniques