DafnyPro: LLM-Assisted Automated Verification for Dafny Programs
Debangshu Banerjee, Olivier Bouissou, Stefan Zetzsche

TL;DR
DafnyPro is a framework that leverages large language models to improve automated verification of Dafny programs by adding proof annotations, reducing unnecessary invariants, and applying predefined proof strategies, resulting in higher correctness rates.
Contribution
The paper introduces DafnyPro, a novel LLM-assisted framework that enhances verification annotation generation for Dafny, combining multiple components to significantly improve proof correctness.
Findings
DafnyPro improves proof correctness on four benchmarks.
Claude Sonnet 3.5 with DafnyPro achieves 86% correctness on DafnyBench.
Smaller models fine-tuned with DafnyPro data maintain high verification accuracy.
Abstract
We present DafnyPro, an inference-time framework that enhances LLMs for generating verification annotations in Dafny. DafnyPro comprises three key components: a diff-checker that prevents modifications to base program logic, a pruner that removes unnecessary invariants, and a hint-augmentation system that retrieves and applies predefined, problem-independent proof strategies. We evaluate DafnyPro using Claude Sonnet 3.5 and 3.7 on four benchmarks: Clover, MBPP-Dafny, HumanEval-Dafny, and DafnyBench, achieving consistent performance gains in all cases. Notably, on DafnyBench, the most challenging benchmark, Claude Sonnet 3.5 enhanced with DafnyPro achieves 86% correct proofs, a 16 pp improvement over the base model. We also fine-tune two Qwen models on training data derived from verification attempts by larger models enhanced with DafnyPro. Our 7B and 14B models achieve 68% and 70%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Logic, programming, and type systems · Adversarial Robustness in Machine Learning
