VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation
Zichen Xie, Mrigank Pawagi, Yuxin Liu, Aaditi Rai, Lize Shao, John Berberian Jr., Sicong Che, Wenxi Wang

TL;DR
VeriContest is a comprehensive benchmark with 946 problems for evaluating verifiable code generation in Rust, highlighting the gap between code generation and formal verification capabilities of current models.
Contribution
It introduces a large, multi-part benchmark for verifiable code generation, combining natural language, formal specs, proofs, and testing, with a scalable, human-in-the-loop construction pipeline.
Findings
Strongest model achieves 92.18% on code generation but only 5.29% end-to-end verification.
Benchmark reveals significant gaps in specification and proof generation abilities.
VeriContest enables isolated and end-to-end evaluation of verifiable code synthesis.
Abstract
Large language models can generate useful code from natural language, but their outputs come without correctness guarantees. Verifiable code generation offers a path beyond testing by requiring models to produce not only executable code, but also formal specifications and machine-checkable proofs. Progress in this direction, however, is difficult to measure: existing benchmarks are often small, focus on only one part of the pipeline, lack ground-truth proofs or rigorous specification validation, or target verification settings far from mainstream software development. We present VeriContest, a benchmark of 946 competitive-programming problems from LeetCode and Codeforces for verifiable code generation in Rust with Verus. Each problem pairs a natural language description with expert-validated formal specifications, judge-accepted Rust code, Verus-checked proofs, and positive and negative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
