RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models
Hanzheng Dai, Yuanliang Li, Jun Yan, Zhibo Zhang

TL;DR
RefPentester is a knowledge-informed, self-reflective penetration testing framework using LLMs that improves automation, adaptability, and success rates in identifying vulnerabilities, surpassing baseline models like GPT-4o.
Contribution
The paper introduces RefPentester, a novel LLM-based framework that incorporates self-reflection and knowledge guidance to enhance automated penetration testing.
Findings
RefPentester successfully revealed credentials on Hack The Box's Sau machine.
It outperformed the baseline GPT-4o model by 16.7% in success rate.
Demonstrated superior success rates across different PT stages.
Abstract
Automated penetration testing (AutoPT) powered by large language models (LLMs) has gained attention for its ability to automate ethical hacking processes and identify vulnerabilities in target systems by leveraging the inherent knowledge of LLMs. However, existing LLM-based AutoPT frameworks often underperform compared to human experts in challenging tasks for several reasons: the imbalanced knowledge used in LLM training, short-sightedness in the planning process, and hallucinations during command generation. Moreover, the trial-and-error nature of the PT process is constrained by existing frameworks lacking mechanisms to learn from previous failures, restricting adaptive improvement of PT strategies. To address these limitations, we propose a knowledge-informed, self-reflective PT framework powered by LLMs, called RefPentester. This AutoPT framework is designed to assist human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSoftmax · Attention Is All You Need
