CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Zhongyuan Peng; Yifan Yao; Kaijing Ma; Shuyue Guo; Yizhe Li; Yichi Zhang; Chenchen Zhang; Yifan Zhang; Zhouliang Yu; Luming Li; Minghao Liu; Yihang Xia; Jiawei Shen; Yuchen Wu; Yixin Cao; Zhaoxiang Zhang; Wenhao Huang; Jiaheng Liu; Ge Zhang

arXiv:2507.06181·cs.CL·July 9, 2025

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Zhongyuan Peng, Yifan Yao, Kaijing Ma, Shuyue Guo, Yizhe Li, Yichi Zhang, Chenchen Zhang, Yifan Zhang, Zhouliang Yu, Luming Li, Minghao Liu, Yihang Xia, Jiawei Shen, Yuchen Wu, Yixin Cao, Zhaoxiang Zhang, Wenhao Huang, Jiaheng Liu, Ge Zhang

PDF

Open Access 1 Repo 3 Models 3 Datasets

TL;DR

CriticLean introduces a critic-guided reinforcement learning framework that improves the semantic accuracy of translating natural language math statements into formal code, advancing automated theorem proving.

Contribution

The paper presents CriticLeanGPT, a novel critic model trained to evaluate formalizations, and CriticLeanBench, a benchmark for assessing semantic correctness in formalizations.

Findings

01

CriticLeanGPT outperforms existing models in semantic evaluation.

02

The critic-guided approach enhances the reliability of formalizations.

03

A large, diverse dataset supports the framework's effectiveness.

Abstract

Translating natural language mathematical statements into formal, executable code is a fundamental challenge in automated theorem proving. While prior work has focused on generation and compilation success, little attention has been paid to the critic phase-the evaluation of whether generated formalizations truly capture the semantic intent of the original problem. In this paper, we introduce CriticLean, a novel critic-guided reinforcement learning framework that elevates the role of the critic from a passive validator to an active learning component. Specifically, first, we propose the CriticLeanGPT, trained via supervised fine-tuning and reinforcement learning, to rigorously assess the semantic fidelity of Lean 4 formalizations. Then, we introduce CriticLeanBench, a benchmark designed to measure models' ability to distinguish semantically correct from incorrect formalizations, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

multimodal-art-projection/criticlean
noneOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Machine Learning in Materials Science · Topic Modeling