VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

Zichen Xie; Mrigank Pawagi; Yuxin Liu; Aaditi Rai; Lize Shao; John Berberian Jr.; Sicong Che; Wenxi Wang

arXiv:2605.08553·cs.SE·May 12, 2026

VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

Zichen Xie, Mrigank Pawagi, Yuxin Liu, Aaditi Rai, Lize Shao, John Berberian Jr., Sicong Che, Wenxi Wang

PDF

TL;DR

VeriContest is a comprehensive benchmark with 946 problems for evaluating verifiable code generation in Rust, highlighting the gap between code generation and formal verification capabilities of current models.

Contribution

It introduces a large, multi-part benchmark for verifiable code generation, combining natural language, formal specs, proofs, and testing, with a scalable, human-in-the-loop construction pipeline.

Findings

01

Strongest model achieves 92.18% on code generation but only 5.29% end-to-end verification.

02

Benchmark reveals significant gaps in specification and proof generation abilities.

03

VeriContest enables isolated and end-to-end evaluation of verifiable code synthesis.

Abstract

Large language models can generate useful code from natural language, but their outputs come without correctness guarantees. Verifiable code generation offers a path beyond testing by requiring models to produce not only executable code, but also formal specifications and machine-checkable proofs. Progress in this direction, however, is difficult to measure: existing benchmarks are often small, focus on only one part of the pipeline, lack ground-truth proofs or rigorous specification validation, or target verification settings far from mainstream software development. We present VeriContest, a benchmark of 946 competitive-programming problems from LeetCode and Codeforces for verifiable code generation in Rust with Verus. Each problem pairs a natural language description with expert-validated formal specifications, judge-accepted Rust code, Verus-checked proofs, and positive and negative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.