ATG: Benchmarking Automated Theorem Generation for Generative Language Models
Xiaohan Lin, Qingxing Cao, Yinya Huang, Zhicheng Yang, Zhengying Liu,, Zhenguo Li, Xiaodan Liang

TL;DR
This paper introduces the ATG benchmark to evaluate language models' ability to generate new, reusable theorems, which enhances their performance in automated theorem proving and advances mathematical reasoning capabilities.
Contribution
The paper presents a novel ATG benchmark constructed from the Metamath library to assess and improve language models' theorem generation for complex mathematical proofs.
Findings
High-quality ATG data improves theorem proving performance
Current models can generate some library theorems but need improvement for advanced theorems
The benchmark reveals gaps in models' ability to produce human-like theorems
Abstract
Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses with the exponentially growing search space. Therefore, this paper proposes an Automated Theorem Generation (ATG) benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems that are applicable for downstream theorem proving as reusable knowledge. Specifically, we construct the ATG benchmark by splitting the Metamath library into three sets: axioms, library, and problem based on their proving depth. We conduct extensive experiments to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsLib
