FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning
Haoran Ding, Zhaoguo Wang, Haibo Chen

TL;DR
FM-Agent is a novel framework that uses LLMs to automatically generate function specifications and perform compositional reasoning on large systems, effectively identifying bugs with minimal human effort.
Contribution
It introduces a top-down, LLM-based approach to automatically generate specifications from natural language and reason about large-scale systems, reducing human burden.
Findings
Successfully reasoned about systems with up to 143k LoC within 2 days.
Discovered 522 new bugs in tested large-scale systems.
Automatically generated test cases to confirm and explain bugs.
Abstract
LLM-assisted software development has become increasingly prevalent, and can generate large-scale systems, such as compilers. It becomes crucial to strengthen the correctness of the generated code. However, automated reasoning for large-scale systems remains challenging due to code complexity. Hoare logic offers an approach to decomposing a large system into smaller components and reasoning about them separately (i.e., compositional reasoning). However, existing works still struggle to scale, because Hoare logic requires writing formal specifications for each function, imposing a heavy human burden. The problem is exacerbated when code is generated by LLMs, as developers lack a deep understanding of each function's expected behavior. This paper presents FM-Agent, the first framework that realizes automated compositional reasoning for large-scale systems. Leveraging LLMs, FM-Agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
