Evaluating LLM-Generated ACSL Annotations for Formal Verification

Arshad Beg; Diarmuid O'Donoghue; Rosemary Monahan

arXiv:2602.13851·cs.SE·April 15, 2026

Evaluating LLM-Generated ACSL Annotations for Formal Verification

Arshad Beg, Diarmuid O'Donoghue, Rosemary Monahan

PDF

TL;DR

This paper empirically evaluates automated ACSL annotation methods for C programs, comparing rule-based, plugin, and large language model approaches in the context of formal verification.

Contribution

It provides a comparative analysis of LLM-based and rule-based annotation strategies for C program verification, highlighting current limitations and potential of LLMs.

Findings

01

Rule-based methods are more reliable for verification success.

02

LLM-based approaches show variable performance and limitations.

03

The study uses the CASP benchmark and Frama-C's WP plugin for evaluation.

Abstract

Formal specifications are crucial for building verifiable and dependable software systems, yet generating accurate and verifiable specifications for real-world C programs remains challenging. This paper presents an empirical evaluation of automated ACSL annotation generation strategies for C programs, comparing a rule-based Python script, Frama-C's RTE plugin, and three large language models (DeepSeek-V3.2, GPT-5.2, and OLMo 3.1 32B Instruct). The study focuses on one-shot annotation generation, assessing how these approaches perform when directly applied to verification tasks. Using a filtered subset of the CASP benchmark, we evaluate generated annotations through Frama-C's WP plugin with multiple SMT solvers, analyzing proof success rates, solver timeouts, and internal processing time. Our results show that rule-based approaches remain more reliable for verification success, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.