From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

Md Erfan; Md Kamal Hossain Chowdhury; Ahmed Ryan; and Md Rayhanur Rahman

arXiv:2604.22601·cs.SE·April 27, 2026

From Natural Language to Verified Code: Toward AI Assisted Problem-to-Code Generation with Dafny-Based Formal Verification

Md Erfan, Md Kamal Hossain Chowdhury, Ahmed Ryan, and Md Rayhanur Rahman

PDF

TL;DR

This paper demonstrates that open-weight large language models can effectively generate verified code with formal specifications using a tiered prompting strategy and iterative feedback, achieving high verification success rates.

Contribution

It introduces the NL2VC-60 dataset and a tiered prompting approach with self-healing feedback, significantly improving formal verification success in open-weight LLMs.

Findings

01

Gemma 4-31B achieved 90.91% verification success.

02

GPT-OSS 120B improved from 0% to 81.82% success.

03

Structural signatures and iterative prompts greatly enhance verification outcomes.

Abstract

Large Language Models (LLMs) show promise in automated software engineering, yet their guarantee of correctness is frequently undermined by erroneous or hallucinated code. To enforce model honesty, formal verification requires LLMs to synthesize implementation logic alongside formal specifications that are subsequently proven correct by a mathematical verifier. However, the transition from informal natural language to precise formal specification remains an arduous task. Our work addresses this by providing the NaturalLanguage2VerifiedCode (NL2VC)-60 dataset: a collection of 60 complex algorithmic problems. We evaluate 11 randomly selected problem sets across seven open-weight LLMs using a tiered prompting strategy: contextless prompts, signature prompts providing structural anchors, and self-healing prompts utilizing iterative feedback from the Dafny verifier. To address vacuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.