Proof Artifact Co-training for Theorem Proving with Language Models
Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward W. Ayers, Stanislas, Polu

TL;DR
This paper introduces PACT, a co-training methodology that leverages self-supervised data from proof artifacts to enhance theorem proving with language models, significantly improving success rates in formal mathematics.
Contribution
The paper presents PACT, a novel co-training approach that extracts self-supervised proof data to improve Transformer-based theorem proving in formal mathematics.
Findings
Proof success rate increased from 32% to 48%.
PACT effectively utilizes proof artifacts for self-supervised learning.
Method demonstrates improved performance on Lean theorem proving tasks.
Abstract
Labeled data for imitation learning of theorem proving in large libraries of formalized mathematics is scarce as such libraries require years of concentrated effort by human specialists to be built. This is particularly challenging when applying large Transformer language models to tactic prediction, because the scaling of performance with respect to model size is quickly disrupted in the data-scarce, easily-overfitted regime. We propose PACT ({\bf P}roof {\bf A}rtifact {\bf C}o-{\bf T}raining), a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective. We apply this methodology to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date. We instrument Lean with a neural theorem prover driven by a Transformer language model and show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Label Smoothing · Byte Pair Encoding · Layer Normalization · Dense Connections · Multi-Head Attention
