LeanTutor: Towards a Verified AI Mathematical Proof Tutor
Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade

TL;DR
LeanTutor is an AI-based proof tutor that combines large language models and theorem provers to provide a verified, natural language interface for learning formal mathematics, aiming to improve correctness and accessibility.
Contribution
The paper introduces LeanTutor, a novel system integrating LLMs with theorem provers for verified mathematical tutoring, and presents PeanoBench, a new dataset for evaluation.
Findings
Successful integration of LLMs and theorem provers in LeanTutor
PeanoBench enables systematic evaluation of proof tutoring systems
LeanTutor demonstrates potential for verified, natural language mathematical education
Abstract
This paper considers the development of an AI-based provably-correct mathematical proof tutor. While Large Language Models (LLMs) allow seamless communication in natural language, they are error prone. Theorem provers such as Lean allow for provable-correctness, but these are hard for students to learn. We present a proof-of-concept system (LeanTutor) by combining the complementary strengths of LLMs and theorem provers. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. To evaluate the system, we introduce PeanoBench, a dataset of 371 Peano Arithmetic proofs in human-written natural language and formal language, derived from the Natural Numbers Game.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Mathematics, Computing, and Information Processing · Logic, programming, and type systems
MethodsHierarchical Information Threading
