An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications
Mohannad Alhanahnah, Md Rashedul Hasan, Lisong Xu, Hamid Bagheri

TL;DR
This paper empirically evaluates the effectiveness of large language models in repairing declarative formal specifications, specifically Alloy, demonstrating that dual-agent with auto-prompting approaches outperform existing methods.
Contribution
It is the first systematic study assessing LLMs for repairing declarative specifications, incorporating recent concepts like agents, feedback, and auto-prompting in this context.
Findings
Dual-agent with auto-prompting outperforms other configurations.
LLMs can effectively repair Alloy specifications, surpassing state-of-the-art techniques.
Auto-prompting increases iterations and token usage but improves repair success.
Abstract
Automatic Program Repair (APR) has garnered significant attention as a practical research domain focused on automatically fixing bugs in programs. While existing APR techniques primarily target imperative programming languages like C and Java, there is a growing need for effective solutions applicable to declarative software specification languages. This paper systematically investigates the capacity of Large Language Models (LLMs) to repair declarative specifications in Alloy, a declarative formal language used for software specification. We designed 12 different repair settings, encompassing single-agent and dual-agent paradigms, utilizing various LLMs. These configurations also incorporate different levels of feedback, including an auto-prompting mechanism for generating prompts autonomously using LLMs. Our study reveals that dual-agent with auto-prompting setup outperforms the other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Scientific Computing and Data Management
