Agentic Verification of Software Systems
Haoxin Tu, Huan Zhao, Yahui Song, Mehtab Zafar, Ruijie Meng, Abhik Roychoudhury

TL;DR
This paper introduces AutoRocq, an autonomous LLM agent that iteratively verifies software programs by collaborating with the Rocq theorem prover, demonstrating promising results on benchmarks and Linux kernel modules.
Contribution
It presents the first LLM-based program verification agent that learns on-the-fly and collaborates with a theorem prover for proof refinement, unlike prior training-dependent methods.
Findings
Effective automated verification on SV-COMP benchmarks.
Successful verification of Linux kernel modules.
Autonomous proof refinement improves verification accuracy.
Abstract
Automatically generated code is gaining traction recently, owing to the prevalence of Large Language Models (LLMs). Further, the AlphaProof initiative has demonstrated the possibility of using AI for general mathematical reasoning. Reasoning about computer programs (software) can be accomplished via general mathematical reasoning; however, it tends to be more structured and richer in contexts. This forms an attractive proposition, since then AI agents can be used to reason about voluminous code that gets generated by AI. In this work, we present a first LLM agent, AutoRocq, for conducting program verification. Unlike past works, which rely on extensive training of LLMs on proof examples, our agent learns on-the-fly and improves the proof via an iterative refinement loop. The iterative improvement of the proof is achieved by the proof agent communicating with the Rocq (formerly Coq)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
