PenForge: On-the-Fly Expert Agent Construction for Automated Penetration Testing
Huihui Huang, Jieke Shi, Junkai Chen, Ting Zhang, Yikun Li, Chengran Yang, Eng Lieh Ouh, Lwin Khin Shar, David Lo

TL;DR
PenForge introduces a novel framework that dynamically constructs expert agents during penetration testing, significantly improving success rates in complex, zero-day vulnerability scenarios by integrating automated reconnaissance and context-aware exploitation.
Contribution
It presents the first framework for on-the-fly expert agent construction in penetration testing, enhancing adaptability and effectiveness over static or generic approaches.
Findings
Achieved a 30.0% exploit success rate on CVE-Bench in zero-day settings.
Outperformed state-of-the-art methods by three times in success rate.
Identified future opportunities for richer tool-usage and explainability.
Abstract
Penetration testing is essential for identifying vulnerabilities in web applications before real adversaries can exploit them. Recent work has explored automating this process with Large Language Model (LLM)-powered agents, but existing approaches either rely on a single generic agent that struggles in complex scenarios or narrowly specialized agents that cannot adapt to diverse vulnerability types. We therefore introduce PenForge, a framework that dynamically constructs expert agents during testing rather than relying on those prepared beforehand. By integrating automated reconnaissance of potential attack surfaces with agents instantiated on the fly for context-aware exploitation, PenForge achieves a 30.0% exploit success rate (12/40) on CVE-Bench in the particularly challenging zero-day setting, which is a 3 times improvement over the state-of-the-art. Our analysis also identifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Information and Cyber Security · Advanced Malware Detection Techniques
