Construction and Evaluation of LLM-based agents for Semi-Autonomous   penetration testing

Masaya Kobayashi; Masane Fuchi; Amar Zanashir; Tomonori Yoneda,; Tomohiro Takagi

arXiv:2502.15506·cs.CR·February 24, 2025

Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing

Masaya Kobayashi, Masane Fuchi, Amar Zanashir, Tomonori Yoneda,, Tomohiro Takagi

PDF

TL;DR

This paper presents a semi-autonomous system utilizing multiple large language models to execute complex cybersecurity workflows, including attack strategy formulation and command generation, demonstrated on Hack The Box virtual machines.

Contribution

It introduces a novel multi-LLM based framework for semi-autonomous penetration testing, addressing reasoning and domain knowledge limitations in cybersecurity.

Findings

01

System can autonomously construct attack strategies

02

Reduces manual intervention in penetration testing

03

Effective on Hack The Box virtual environments

Abstract

With the emergence of high-performance large language models (LLMs) such as GPT, Claude, and Gemini, the autonomous and semi-autonomous execution of tasks has significantly advanced across various domains. However, in highly specialized fields such as cybersecurity, full autonomy remains a challenge. This difficulty primarily stems from the limitations of LLMs in reasoning capabilities and domain-specific knowledge. We propose a system that semi-autonomously executes complex cybersecurity workflows by employing multiple LLMs modules to formulate attack strategies, generate commands, and analyze results, thereby addressing the aforementioned challenges. In our experiments using Hack The Box virtual machines, we confirmed that our system can autonomously construct attack strategies, issue appropriate commands, and automate certain processes, thereby reducing the need for manual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.