MINIF2F-DAFNY: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification

Mantas Baksys; Stefan Zetzsche; Olivier Bouissou; Sean B. Holden

arXiv:2512.10187·cs.LG·February 3, 2026

MINIF2F-DAFNY: LLM-Guided Mathematical Theorem Proving via Auto-Active Verification

Mantas Baksys, Stefan Zetzsche, Olivier Bouissou, Sean B. Holden

PDF

Open Access

TL;DR

This paper introduces MINIF2F-DAFNY, a benchmark translating mathematical problems to an auto-active verifier, demonstrating LLMs' effectiveness in guiding theorem proving with automation handling detailed proof steps.

Contribution

We present the first translation of miniF2F to an auto-active verifier, enabling evaluation of LLMs in guiding mathematical theorem proving within Dafny.

Findings

01

Dafny automates 39-44% of problems with empty proofs.

02

LLMs achieve 55.7% success rate in guiding proofs.

03

Effective division of labor: LLMs guide high-level reasoning, automation handles details.

Abstract

LLMs excel at reasoning, but validating their steps remains challenging. Formal verification offers a solution through mechanically checkable proofs. Interactive theorem provers (ITPs) dominate mathematical reasoning but require detailed low-level proof steps, while auto-active verifiers offer automation but focus on software verification. Recent work has begun bridging this divide by evaluating LLMs for software verification in ITPs, but the complementary direction--LLMs for mathematical theorem proving in auto-active verifiers--remains unexplored. We present MINIF2F-DAFNY, the first translation of the widely-used mathematical benchmark miniF2F to an auto-active verifier: Dafny. We find that Dafny's automation alone solves 39-44% of problems with empty proofs, whereas many require substantial proof guidance in ITPs. For remaining problems, we evaluate 7 off-the-shelf LLMs, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, programming, and type systems · Formal Methods in Verification · Polynomial and algebraic computation