FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

Haoran Ding; Zhaoguo Wang; Haibo Chen

arXiv:2604.11556·cs.SE·April 14, 2026

FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

Haoran Ding, Zhaoguo Wang, Haibo Chen

PDF

TL;DR

FM-Agent is a novel framework that uses LLMs to automatically generate function specifications and perform compositional reasoning on large systems, effectively identifying bugs with minimal human effort.

Contribution

It introduces a top-down, LLM-based approach to automatically generate specifications from natural language and reason about large-scale systems, reducing human burden.

Findings

01

Successfully reasoned about systems with up to 143k LoC within 2 days.

02

Discovered 522 new bugs in tested large-scale systems.

03

Automatically generated test cases to confirm and explain bugs.

Abstract

LLM-assisted software development has become increasingly prevalent, and can generate large-scale systems, such as compilers. It becomes crucial to strengthen the correctness of the generated code. However, automated reasoning for large-scale systems remains challenging due to code complexity. Hoare logic offers an approach to decomposing a large system into smaller components and reasoning about them separately (i.e., compositional reasoning). However, existing works still struggle to scale, because Hoare logic requires writing formal specifications for each function, imposing a heavy human burden. The problem is exacerbated when code is generated by LLMs, as developers lack a deep understanding of each function's expected behavior. This paper presents FM-Agent, the first framework that realizes automated compositional reasoning for large-scale systems. Leveraging LLMs, FM-Agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.