ATG: Benchmarking Automated Theorem Generation for Generative Language   Models

Xiaohan Lin; Qingxing Cao; Yinya Huang; Zhicheng Yang; Zhengying Liu,; Zhenguo Li; Xiaodan Liang

arXiv:2405.06677·cs.CL·May 14, 2024

ATG: Benchmarking Automated Theorem Generation for Generative Language Models

Xiaohan Lin, Qingxing Cao, Yinya Huang, Zhicheng Yang, Zhengying Liu,, Zhenguo Li, Xiaodan Liang

PDF

Open Access

TL;DR

This paper introduces the ATG benchmark to evaluate language models' ability to generate new, reusable theorems, which enhances their performance in automated theorem proving and advances mathematical reasoning capabilities.

Contribution

The paper presents a novel ATG benchmark constructed from the Metamath library to assess and improve language models' theorem generation for complex mathematical proofs.

Findings

01

High-quality ATG data improves theorem proving performance

02

Current models can generate some library theorems but need improvement for advanced theorems

03

The benchmark reveals gaps in models' ability to produce human-like theorems

Abstract

Humans can develop new theorems to explore broader and more complex mathematical results. While current generative language models (LMs) have achieved significant improvement in automatically proving theorems, their ability to generate new or reusable theorems is still under-explored. Without the new theorems, current LMs struggle to prove harder theorems that are distant from the given hypotheses with the exponentially growing search space. Therefore, this paper proposes an Automated Theorem Generation (ATG) benchmark that evaluates whether an agent can automatically generate valuable (and possibly brand new) theorems that are applicable for downstream theorem proving as reusable knowledge. Specifically, we construct the ATG benchmark by splitting the Metamath library into three sets: axioms, library, and problem based on their proving depth. We conduct extensive experiments to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsLib