TheoremForge: Scaling up Formal Data Synthesis with Low-Budget Agentic Workflow

Yicheng Tao; Hongteng Xu

arXiv:2601.17332·cs.AI·January 27, 2026

TheoremForge: Scaling up Formal Data Synthesis with Low-Budget Agentic Workflow

Yicheng Tao, Hongteng Xu

PDF

Open Access 1 Datasets

TL;DR

TheoremForge introduces a cost-effective, scalable formal data synthesis pipeline that decomposes formalization into five sub-tasks, recovering valuable training signals from failed attempts to improve proof generation efficiency.

Contribution

It presents a novel decoupled extraction strategy and a five-step workflow that significantly reduces costs and increases data yield for formal mathematics data synthesis.

Findings

01

Achieved a 12.6% verified rate on 2,000 problems, surpassing the baseline of 8.6%.

02

Reduced average cost to $0.481 per successful trajectory.

03

Increased proof generation data yield by 1.6 times.

Abstract

The high cost of agentic workflows in formal mathematics hinders large-scale data synthesis, exacerbating the scarcity of open-source corpora. To address this, we introduce \textbf{TheoremForge}, a cost-effective formal data synthesis pipeline that decomposes the formalization process into five sub-tasks, which are \textit{statement formalization}, \textit{proof generation}, \textit{premise selection}, \textit{proof correction} and \textit{proof sketching}. By implementing a \textit{Decoupled Extraction Strategy}, the workflow recovers valid training signals from globally failed trajectories, effectively utilizing wasted computation. Experiments on a 2,000-problem benchmark demonstrate that TheoremForge achieves a Verified Rate of 12.6\%, surpassing the 8.6\% baseline, at an average cost of only \textbf{$0.481} per successful trajectory using Gemini-3-Flash. Crucially, our strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

timechess/theoremforge
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, programming, and type systems · Scientific Computing and Data Management · Machine Learning in Materials Science