Execution-State-Aware LLM Reasoning for Automated Proof-of-Vulnerability Generation
Haoyu Li, Xijia Che, Yanhao Wang, Xiaojing Liao, Luyi Xing

TL;DR
This paper introduces DrillAgent, an execution-state-aware framework that enhances automated proof-of-vulnerability generation by integrating LLM reasoning with concrete execution feedback, significantly improving success rates.
Contribution
The paper presents a novel iterative hypothesis-verification-refinement approach that combines LLM semantic inference with execution state feedback for more accurate PoV generation.
Findings
DrillAgent outperforms existing LLM baselines by solving up to 52.8% more CVE tasks.
The framework effectively bridges static reasoning and dynamic execution for vulnerability proof generation.
Experimental results demonstrate the importance of execution-state-awareness in complex software security tasks.
Abstract
Proof-of-Vulnerability (PoV) generation is a critical task in software security, serving as a cornerstone for vulnerability validation, false positive reduction, and patch verification. While directed fuzzing effectively drives path exploration, satisfying complex semantic constraints remains a persistent bottleneck in automated exploit generation. Large Language Models (LLMs) offer a promising alternative with their semantic reasoning capabilities; however, existing LLM-based approaches lack sufficient grounding in concrete execution behavior, limiting their ability to generate precise PoVs. In this paper, we present DrillAgent, an agentic framework that reformulates PoV generation as an iterative hypothesis-verification-refinement process. To bridge the gap between static reasoning and dynamic execution, DrillAgent synergizes LLM-based semantic inference with feedback from concrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Security and Verification in Computing · Information and Cyber Security
