Proof Artifact Co-training for Theorem Proving with Language Models

Jesse Michael Han; Jason Rute; Yuhuai Wu; Edward W. Ayers; Stanislas; Polu

arXiv:2102.06203·cs.AI·March 17, 2022·6 cites

Proof Artifact Co-training for Theorem Proving with Language Models

Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, Stanislas, Polu

PDF

Open Access 4 Repos 1 Models 1 Video

TL;DR

This paper introduces PACT, a co-training methodology that leverages self-supervised data from proof artifacts to enhance theorem proving with language models, significantly improving success rates in formal mathematics.

Contribution

The paper presents PACT, a novel co-training approach that extracts self-supervised proof data to improve Transformer-based theorem proving in formal mathematics.

Findings

01

Proof success rate increased from 32% to 48%.

02

PACT effectively utilizes proof artifacts for self-supervised learning.

03

Method demonstrates improved performance on Lean theorem proving tasks.

Abstract

Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built. This is particularly challenging when applying large Transformer language models to tactic prediction, because the scaling of performance with respect to model size is quickly disrupted in the data-scarce, easily-overfitted regime. We propose PACT ({\bf P}roof {\bf A}rtifact {\bf C}o-{\bf T}raining), a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective. We apply this methodology to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date. We instrument Lean with a neural theorem prover driven by a Transformer language model and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
wellecks/llmstep-mathlib4-pythia2.8b
model· 23 dl· ♡ 6
23 dl♡ 6

Videos

Proof Artifact Co-Training for Theorem Proving with Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Label Smoothing · Byte Pair Encoding · Layer Normalization · Dense Connections · Multi-Head Attention