PenForge: On-the-Fly Expert Agent Construction for Automated Penetration Testing

Huihui Huang; Jieke Shi; Junkai Chen; Ting Zhang; Yikun Li; Chengran Yang; Eng Lieh Ouh; Lwin Khin Shar; David Lo

arXiv:2601.06910·cs.SE·January 13, 2026

PenForge: On-the-Fly Expert Agent Construction for Automated Penetration Testing

Huihui Huang, Jieke Shi, Junkai Chen, Ting Zhang, Yikun Li, Chengran Yang, Eng Lieh Ouh, Lwin Khin Shar, David Lo

PDF

Open Access

TL;DR

PenForge introduces a novel framework that dynamically constructs expert agents during penetration testing, significantly improving success rates in complex, zero-day vulnerability scenarios by integrating automated reconnaissance and context-aware exploitation.

Contribution

It presents the first framework for on-the-fly expert agent construction in penetration testing, enhancing adaptability and effectiveness over static or generic approaches.

Findings

01

Achieved a 30.0% exploit success rate on CVE-Bench in zero-day settings.

02

Outperformed state-of-the-art methods by three times in success rate.

03

Identified future opportunities for richer tool-usage and explainability.

Abstract

Penetration testing is essential for identifying vulnerabilities in web applications before real adversaries can exploit them. Recent work has explored automating this process with Large Language Model (LLM)-powered agents, but existing approaches either rely on a single generic agent that struggles in complex scenarios or narrowly specialized agents that cannot adapt to diverse vulnerability types. We therefore introduce PenForge, a framework that dynamically constructs expert agents during testing rather than relying on those prepared beforehand. By integrating automated reconnaissance of potential attack surfaces with agents instantiated on the fly for context-aware exploitation, PenForge achieves a 30.0% exploit success rate (12/40) on CVE-Bench in the particularly challenging zero-day setting, which is a 3 times improvement over the state-of-the-art. Our analysis also identifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Application Security Vulnerabilities · Information and Cyber Security · Advanced Malware Detection Techniques