An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

Mohannad Alhanahnah; Md Rashedul Hasan; Lisong Xu; Hamid Bagheri

arXiv:2404.11050·cs.SE·June 13, 2025·3 cites

An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications

Mohannad Alhanahnah, Md Rashedul Hasan, Lisong Xu, Hamid Bagheri

PDF

Open Access 1 Repo

TL;DR

This paper empirically evaluates the effectiveness of large language models in repairing declarative formal specifications, specifically Alloy, demonstrating that dual-agent with auto-prompting approaches outperform existing methods.

Contribution

It is the first systematic study assessing LLMs for repairing declarative specifications, incorporating recent concepts like agents, feedback, and auto-prompting in this context.

Findings

01

Dual-agent with auto-prompting outperforms other configurations.

02

LLMs can effectively repair Alloy specifications, surpassing state-of-the-art techniques.

03

Auto-prompting increases iterations and token usage but improves repair success.

Abstract

Automatic Program Repair (APR) has garnered significant attention as a practical research domain focused on automatically fixing bugs in programs. While existing APR techniques primarily target imperative programming languages like C and Java, there is a growing need for effective solutions applicable to declarative software specification languages. This paper systematically investigates the capacity of Large Language Models (LLMs) to repair declarative specifications in Alloy, a declarative formal language used for software specification. We designed 12 different repair settings, encompassing single-agent and dual-agent paradigms, utilizing various LLMs. These configurations also incorporate different levels of feedback, including an auto-prompting mechanism for generating prompts autonomously using LLMs. Our study reveals that dual-agent with auto-prompting setup outperforms the other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mohannadcse/alloyspecrepair
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Scientific Computing and Data Management