Understanding Tool-Augmented Agents for Lean Formalization: A Factorial Analysis
Ke Zhang, Patricio Gallardo, Maziar Raissi, Sudhir Murthy

TL;DR
This paper systematically analyzes how different tool categories enhance large language models in translating informal math into formal Lean 4 code, showing significant improvements in success and fidelity.
Contribution
It introduces a factorial analysis of three tool types for tool-augmented agents, quantifying their individual impact on formalization performance.
Findings
Benchmarking shows large gains over one-shot baselines.
Factorial analysis isolates the contribution of each tool category.
Tool augmentation significantly improves code compilation and semantic fidelity.
Abstract
Automatic translation of natural language mathematics into faithful Lean 4 code is hindered by the fundamental dissonance between informal set-theoretic intuition and strict formal type theory. This gap often causes LLMs to hallucinate non-existent library definitions, resulting in code that fails to compile or lacks semantic fidelity. In this work, we investigate the effectiveness of tool-augmented agents for this task through a systematic factorial analysis of three distinct tool categories: Fine-tuned Model Querying (accessing expert drafts), Knowledge Search (retrieving symbol definitions), and Compiler Feedback (verifying code via a Lean REPL). We first benchmark the agent against one-shot baselines, demonstrating large gains in both compilation success and semantic equivalence. We then use the factorial decomposition to quantify the impact of each category, isolating the marginal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
