Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o   Under Data Scarcity

Dylan Zhang; Justin Wang; Tianran Sun

arXiv:2502.11901·cs.CL·April 15, 2025

Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity

Dylan Zhang, Justin Wang, Tianran Sun

PDF

Open Access

TL;DR

This paper introduces PoPilot, a synthetic data augmentation method for proof-oriented programming, enabling a 14B model to outperform GPT-4o in proof synthesis and repair tasks by significant margins.

Contribution

The paper presents the first synthetic data augmentation approach for project-level proof-oriented programming, improving model performance in proof synthesis and repair.

Findings

01

PoPilot outperforms GPT-4o by 64% in proof-oriented programming tasks.

02

PoPilot improves GPT-4o's performance by 54% through proof repair.

03

Synthetic data augmentation enhances proof reasoning capabilities in language models.

Abstract

Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest in two key ways: (1) a lack of sufficient corpora for proof-oriented programming languages such as F*, and (2) the absence of large-scale, project-level proof-oriented implementations that can teach the model the intricate reasoning process when performing proof-oriented programming. We present the first on synthetic data augmentation for project level proof oriented programming for both generation and repair. Our method addresses data scarcity by synthesizing basic proof-oriented programming problems for proficiency in that language; incorporating diverse coding data for reasoning capability elicitation and creating new proofs and repair data within existing repositories. This approach enables language models to both synthesize and repair proofs for function- and repository-level code. We show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms