Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Saikat Chakraborty, Gabriel Ebner, Siddharth Bhat, Sarah Fakhoury,, Sakina Fatima, Shuvendu Lahiri, and Nikhil Swamy

TL;DR
This paper presents a large dataset of SMT-assisted proof-oriented programs in F*, enabling AI-driven program synthesis, and demonstrates that smaller fine-tuned models can perform comparably to larger models with lower costs.
Contribution
The creation of the largest corpus of SMT-assisted F* programs and proofs, along with an analysis of AI models for program synthesis in this domain.
Findings
Smaller models like Phi-2 and StarCoder perform comparably to GPT-4.
Type-based retrieval significantly improves synthesis performance.
The dataset enables future research in AI-assisted proof-oriented programming.
Abstract
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem producing a definition given a formal specification expressed as an F* type. We provide a program fragment checker that queries F* to check the correctness of candidate solutions. We also report on an extended version of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization
