Dafny as Verification-Aware Intermediate Language for Code Generation

Yue Chen Li; Stefan Zetzsche; Siva Somayyajula

arXiv:2501.06283·cs.SE·January 14, 2025

Dafny as Verification-Aware Intermediate Language for Code Generation

Yue Chen Li, Stefan Zetzsche, Siva Somayyajula

PDF

TL;DR

This paper proposes using Dafny as an intermediate language to improve the correctness of code generated by large language models, enabling automatic verification before final code generation.

Contribution

It introduces a verification-aware intermediate language approach, guiding LLMs to produce correct code through formal verification with Dafny before target language compilation.

Findings

01

Prototype demonstrates improved code correctness.

02

Effective verification reduces faulty code generation.

03

Performance on HumanEval benchmarks shows promise.

Abstract

Using large language models (LLMs) to generate source code from natural language prompts is a popular and promising idea with a wide range of applications. One of its limitations is that the generated code can be faulty at times, often in a subtle way, despite being presented to the user as correct. In this paper, we explore ways in which formal methods can assist with increasing the quality of code generated by an LLM. Instead of emitting code in a target language directly, we propose that the user guides the LLM to first generate an opaque intermediate representation, in the verification-aware language Dafny, that can be automatically validated for correctness against agreed on specifications. The correct Dafny program is then compiled to the target language and returned to the user. All user-system interactions throughout the procedure occur via natural language; Dafny code is never…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.