Theorem Prover as a Judge for Synthetic Data Generation

Joshua Ong Jun Leang; Giwon Hong; Wenda Li; Shay B. Cohen

arXiv:2502.13137·cs.AI·January 19, 2026

Theorem Prover as a Judge for Synthetic Data Generation

Joshua Ong Jun Leang, Giwon Hong, Wenda Li, Shay B. Cohen

PDF

Open Access

TL;DR

This paper introduces a novel framework that uses theorem provers to validate and improve the reasoning of large language models in mathematical tasks, significantly enhancing their accuracy with minimal data.

Contribution

It presents iterative autoformalisation, TP-as-a-Judge, and RLTPF, integrating formal verification into LLM training and evaluation to improve reasoning accuracy.

Findings

01

Increased formalisation success rate from 60% to 87%.

02

Achieved up to 6% accuracy improvements on multiple benchmarks.

03

Reduced reliance on human annotations in reasoning evaluation.

Abstract

The demand for synthetic data in mathematical reasoning has increased due to its potential to enhance the mathematical capabilities of large language models (LLMs). However, ensuring the validity of intermediate reasoning steps remains a significant challenge, affecting data quality. While formal verification via theorem provers effectively validates LLM reasoning, the autoformalisation of mathematical proofs remains error-prone. In response, we introduce iterative autoformalisation, an approach that iteratively refines theorem prover formalisation to mitigate errors, thereby increasing the execution rate on the Lean prover from 60% to 87%. Building upon that, we introduce Theorem Prover as a Judge (TP-as-a-Judge), a method that employs theorem prover formalisation to rigorously assess LLM intermediate reasoning, effectively integrating autoformalisation with synthetic data generation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Logic, programming, and type systems · Multi-Agent Systems and Negotiation