Getting More out of Large Language Models for Proofs
Shizhuo Dylan Zhang, Talia Ringer, Emily First

TL;DR
This paper investigates the limitations of large language models in formal theorem proving, analyzing failure cases to improve their effectiveness and accessibility in proof tasks.
Contribution
It provides an analysis of failure modes of language models in proof tasks and offers insights to enhance their performance in formal theorem proving.
Findings
Identified common failure patterns in language models for proofs
Insights into prompt design to mitigate failure cases
Guidelines for improving model reliability in proof tasks
Abstract
Large language models have the potential to simplify formal theorem proving and make it more accessible. But how to get the most out of these models is still an open question. To answer this question, we take a step back and explore the failure cases of these models using common prompting-based techniques. Our talk will discuss these failure cases and what they can teach us about how to get more out of these models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Logic, programming, and type systems
