Evaluating LLM-driven User-Intent Formalization for Verification-Aware   Languages

Shuvendu K. Lahiri

arXiv:2406.09757·cs.PL·October 17, 2024

Evaluating LLM-driven User-Intent Formalization for Verification-Aware Languages

Shuvendu K. Lahiri

PDF

TL;DR

This paper proposes a new automated, symbolic testing approach to evaluate the correctness of user-provided formal specifications in verification-aware programming languages, addressing limitations of previous dynamic testing methods.

Contribution

It introduces a symbolic testing metric for specifications in verification languages like Dafny, enabling better evaluation of user intent formalization compared to prior mutant-based methods.

Findings

01

Automated metric closely matches human-labeled data

02

Method reveals cases where human labels are imperfect

03

Highlights formal verification challenges for broader application

Abstract

Verification-aware programming languages such as Dafny and F* provide means to formally specify and prove properties of a program. Although the problem of checking an implementation against a specification can be defined mechanically, there is no algorithmic way of ensuring the correctness of the {\it user-intent formalization for programs}, expressed as a formal specification. This is because intent or requirement is expressed {\it informally} in natural language and the specification is a formal artefact. Despite, the advent of large language models (LLMs) has made tremendous strides bridging the gap between informal intent and formal program implementations recently, driven in large parts by benchmarks and automated metrics for evaluation. Recent work has proposed a framework for evaluating the {\it user-intent formalization} problem for mainstream programming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer