Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity
Dylan Zhang, Justin Wang, Tianran Sun

TL;DR
This paper introduces PoPilot, a synthetic data augmentation method for proof-oriented programming, enabling a 14B model to outperform GPT-4o in proof synthesis and repair tasks by significant margins.
Contribution
The paper presents the first synthetic data augmentation approach for project-level proof-oriented programming, improving model performance in proof synthesis and repair.
Findings
PoPilot outperforms GPT-4o by 64% in proof-oriented programming tasks.
PoPilot improves GPT-4o's performance by 54% through proof repair.
Synthetic data augmentation enhances proof reasoning capabilities in language models.
Abstract
Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest in two key ways: (1) a lack of sufficient corpora for proof-oriented programming languages such as F*, and (2) the absence of large-scale, project-level proof-oriented implementations that can teach the model the intricate reasoning process when performing proof-oriented programming. We present the first on synthetic data augmentation for project level proof oriented programming for both generation and repair. Our method addresses data scarcity by synthesizing basic proof-oriented programming problems for proficiency in that language; incorporating diverse coding data for reasoning capability elicitation and creating new proofs and repair data within existing repositories. This approach enables language models to both synthesize and repair proofs for function- and repository-level code. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms
