Do AI models help produce verified bug fixes?

Li Huang; Ilgiz Mustafin; Marco Piccioni; Alessandro Schena; Reto Weber; Bertrand Meyer

arXiv:2507.15822·cs.SE·August 5, 2025

Do AI models help produce verified bug fixes?

Li Huang, Ilgiz Mustafin, Marco Piccioni, Alessandro Schena, Reto Weber, Bertrand Meyer

PDF

Open Access

TL;DR

This study investigates whether AI models, specifically Large Language Models, effectively assist programmers in producing verified bug fixes, using formal proof tools to ensure correctness and analyzing programmer behaviors and patterns.

Contribution

It introduces a detailed experimental methodology for evaluating LLMs in debugging, analyzes programmer interactions, and offers validated guidelines for effective use of LLMs in automatic program repair.

Findings

01

Surprising results regarding AI effectiveness in debugging.

02

Identification of 7 distinct patterns of LLM usage.

03

Validated advice for optimizing LLM use in bug fixing.

Abstract

Among areas of software engineering where AI techniques -- particularly, Large Language Models -- seem poised to yield dramatic improvements, an attractive candidate is Automatic Program Repair (APR), the production of satisfactory corrections to software bugs. Does this expectation materialize in practice? How do we find out, making sure that proposed corrections actually work? If programmers have access to LLMs, how do they actually use them to complement their own skills? To answer these questions, we took advantage of the availability of a program-proving environment, which formally determines the correctness of proposed fixes, to conduct a study of program debugging with two randomly assigned groups of programmers, one with access to LLMs and the other without, both validating their answers through the proof tools. The methodology relied on a division into general research…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Scientific Computing and Data Management