Supporting Software Formal Verification with Large Language Models: An Experimental Study
Weiqi Wang, Marie Farrell, Lucas C. Cordeiro, Liping Zhao

TL;DR
This study explores integrating large language models with formal verification tools to improve requirements verification for cyber-physical systems, demonstrating enhanced accuracy and reduced false positives.
Contribution
It introduces SpecVerify, a framework combining LLMs with formal verification, extending expressiveness beyond traditional methods and analyzing LLM limitations.
Findings
Achieves 46.5% verification accuracy on nine systems
Extends assertion expressiveness beyond LTL
Highlights importance of human oversight in verification
Abstract
Formal methods have been employed for requirements verification for a long time. However, it is difficult to automatically derive properties from natural language requirements. SpecVerify addresses this challenge by integrating large language models (LLMs) with formal verification tools, providing a more flexible mechanism for expressing requirements. This framework combines Claude 3.5 Sonnet with the ESBMC verifier to form an automated workflow. Evaluated on nine cyber-physical systems from Lockheed Martin, SpecVerify achieves 46.5% verification accuracy, comparable to NASA's CoCoSim, but with lower false positives. Our framework formulates assertions that extend beyond the expressive power of LTL and identifies falsifiable cases that are missed by more traditional methods. Counterexample analysis reveals CoCoSim's limitations stemming from model connection errors and numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
