HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

Azim Ospanov; Zijin Feng; Jiacheng Sun; Haoli Bai; Xin Shen; Farzan Farnia

arXiv:2511.18760·cs.AI·November 25, 2025

HERMES: Towards Efficient and Verifiable Mathematical Reasoning in LLMs

Azim Ospanov, Zijin Feng, Jiacheng Sun, Haoli Bai, Xin Shen, Farzan Farnia

PDF

Open Access

TL;DR

Hermes is a novel tool that combines informal reasoning with formal verification in Lean, significantly improving mathematical reasoning accuracy and efficiency in large language models across various benchmarks.

Contribution

Introduces Hermes, the first agent to interleave informal reasoning with formal proof verification, enhancing accuracy and reducing computational costs in LLM-based mathematical reasoning.

Findings

01

Up to 67% accuracy improvement on AIME'25 dataset.

02

Reduces token usage and computational cost by 80%.

03

Reliable reasoning accuracy improvements across models of varying sizes.

Abstract

Informal mathematics has been central to modern large language model (LLM) reasoning, offering flexibility and enabling efficient construction of arguments. However, purely informal reasoning is prone to logical gaps and subtle errors that are difficult to detect and correct. In contrast, formal theorem proving provides rigorous, verifiable mathematical reasoning, where each inference step is checked by a trusted compiler in systems such as Lean, but lacks the exploratory freedom of informal problem solving. This mismatch leaves current LLM-based math agents without a principled way to combine the strengths of both paradigms. In this work, we introduce Hermes, the first tool-assisted agent that explicitly interleaves informal reasoning with formally verified proof steps in Lean. The framework performs intermediate formal checking to prevent reasoning drift and employs a memory module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Logic, programming, and type systems · Machine Learning in Materials Science