HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs
Azim Ospanov, Zijin Feng, Jiacheng Sun, Haoli Bai, Xin Shen, Farzan Farnia

TL;DR
Hermes is a novel tool that combines informal reasoning with formal verification in Lean, significantly improving mathematical reasoning accuracy and efficiency in large language models across various benchmarks.
Contribution
Introduces Hermes, the first agent to interleave informal reasoning with formal proof verification, enhancing accuracy and reducing computational costs in LLM-based mathematical reasoning.
Findings
Up to 67% accuracy improvement on AIME'25 dataset.
Reduces token usage and computational cost by 80%.
Reliable reasoning accuracy improvements across models of varying sizes.
Abstract
Informal mathematics has been central to modern large language model (LLM) reasoning, offering flexibility and enabling efficient construction of arguments. However, purely informal reasoning is prone to logical gaps and subtle errors that are difficult to detect and correct. In contrast, formal theorem proving provides rigorous, verifiable mathematical reasoning, where each inference step is checked by a trusted compiler in systems such as Lean, but lacks the exploratory freedom of informal problem solving. This mismatch leaves current LLM-based math agents without a principled way to combine the strengths of both paradigms. In this work, we introduce Hermes, the first tool-assisted agent that explicitly interleaves informal reasoning with formally verified proof steps in Lean. The framework performs intermediate formal checking to prevent reasoning drift and employs a memory module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing · Logic, programming, and type systems · Machine Learning in Materials Science
