AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection
Zijie Zhao, Chenyuan Yang, Weidong Wang, Yihan Yang, Ziqi Zhang, Lingming Zhang

TL;DR
AnyPoC is a multi-agent framework that automates the generation and validation of proof-of-concept tests for bug reports in large software systems, improving bug detection accuracy and reducing false positives.
Contribution
It introduces a novel multi-agent approach for automated PoC generation and validation, significantly enhancing bug detection and validation in large-scale software.
Findings
Produced 1.3x more valid PoCs than state-of-the-art coding agents.
Rejected 9.8x more false-positive bug reports.
Discovered 122 new bugs, with 45 PoCs adopted as regression tests.
Abstract
While recent LLM-based agents can identify many candidate bugs in source code, their reports remain static hypotheses that require manual validation, limiting the practicality of automated bug detection. We frame this challenge as a test generation task: given a candidate report, synthesizing an executable proof-of-concept test, or simply a PoC - such as a script, command sequence, or crafted input - to trigger the suspected defect. Automated PoC generation can act as a scalable validation oracle, enabling end-to-end autonomous bug detection by providing concrete execution evidence. However, naive LLM agents are unreliable validators: they are biased toward "success" and may reward-hack by producing plausible but non-functional PoCs or even hallucinated traces. To address this, we present AnyPoC, a general multi-agent framework that (1) analyzes and fact-checks a candidate bug report,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
