Verifying LLM-Generated Code in the Context of Software Verification   with Ada/SPARK

Marcos Cramer; Lucian McIntyre

arXiv:2502.07728·cs.SE·February 12, 2025

Verifying LLM-Generated Code in the Context of Software Verification with Ada/SPARK

Marcos Cramer, Lucian McIntyre

PDF

Open Access

TL;DR

This paper investigates using formal verification with SPARK for Ada to validate LLM-generated code, introducing a tool that generates annotations to enable verification, achieving over 50% correctness in benchmark tests.

Contribution

The paper presents Marmaragan, a novel tool that uses LLMs to generate SPARK annotations for Ada code, facilitating formal verification of LLM-generated programs.

Findings

01

Marmaragan correctly annotated 50.7% of benchmark cases.

02

The approach demonstrates the potential of combining LLMs with formal verification.

03

Benchmark results show promising accuracy for LLM-generated annotations.

Abstract

Large language models (LLMs) have demonstrated remarkable code generation capabilities, but the correctness of the generated code cannot be inherently trusted. This paper explores the feasibility of using formal software verification, specifically the SPARK framework for Ada, to ensure the reliability of LLM-generated code. We present Marmaragan, a tool that leverages an LLM in order to generate SPARK annotations for existing programs, enabling formal verification of the code. The tool is benchmarked on a curated set of SPARK programs, with annotations selectively removed to test specific capabilities. The performance of Marmaragan with GPT-4o on the benchmark is promising, with correct annotations having been generated for 50.7% of the benchmark cases. The results establish a foundation for future work on combining the power of LLMs with the reliability of formal software verification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Business Process Modeling and Analysis · Software Reliability and Analysis Research

MethodsSparse Evolutionary Training