Improving Lean4 Autoformalization via Cycle Consistency Fine-tuning
Arsen Shebzukhov

TL;DR
This paper enhances autoformalization of mathematical texts into Lean4 by applying cycle consistency fine-tuning with reinforcement learning, significantly improving preservation of meaning over previous supervised methods.
Contribution
It introduces a cycle consistency reward in reinforcement learning to improve natural language to Lean4 formalization, outperforming supervised fine-tuning methods.
Findings
RL with cycle consistency outperforms SFT in preserving meaning
Cycle consistency correlates with better formalization quality
Curriculum learning shows no significant advantage
Abstract
Autoformalization - automatically translating natural language mathematical texts into formal proof language such as Lean4 - can help accelerate AI-assisted mathematical research, be it via proof verification or proof search. I fine-tune Qwen3.5-2B with LoRA for natural language to Lean4 formalization on FineLeanCorpus and consider three training regimes: supervised fine-tuning (SFT) with curriculum learning (difficulty 1 to 10), SFT without curriculum ordering, and reinforcement learning using group relative policy optimization (GRPO) with a cycle consistency reward. Cycle consistency measures how well the meaning of a statement is preserved through a NL to Lean4 to NL' loop, computed as cosine similarity of off-the-shelf sentence embeddings. On an unseen subset of FineLeanCorpus (FLC) and on PutnamBench, RL substantially outperforms both SFT variants (mean cycle consistency 0.669 vs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, programming, and type systems · Mathematics, Computing, and Information Processing · Natural Language Processing Techniques
