Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages
Shuvendu K. Lahiri

TL;DR
This paper proposes a new automated, symbolic testing approach to evaluate the correctness of user-provided formal specifications in verification-aware programming languages, addressing limitations of previous dynamic testing methods.
Contribution
It introduces a symbolic testing metric for specifications in verification languages like Dafny, enabling better evaluation of user intent formalization compared to prior mutant-based methods.
Findings
Automated metric closely matches human-labeled data
Method reveals cases where human labels are imperfect
Highlights formal verification challenges for broader application
Abstract
Verification-aware programming languages such as Dafny and F* provide means to formally specify and prove properties of a program. Although the problem of checking an implementation against a specification can be defined mechanically, there is no algorithmic way of ensuring the correctness of the {\it user-intent formalization for programs}, expressed as a formal specification. This is because intent or requirement is expressed {\it informally} in natural language and the specification is a formal artefact. Despite, the advent of large language models (LLMs) has made tremendous strides bridging the gap between informal intent and formal program implementations recently, driven in large parts by benchmarks and automated metrics for evaluation. Recent work has proposed a framework for evaluating the {\it user-intent formalization} problem for mainstream programming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
