Towards Effectively Leveraging Execution Traces for Program Repair with   Code LLMs

Mirazul Haque; Petr Babkin; Farima Farmahinifarahani; Manuela Veloso

arXiv:2505.04441·cs.LG·May 9, 2025

Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs

Mirazul Haque, Petr Babkin, Farima Farmahinifarahani, Manuela Veloso

PDF

Open Access

TL;DR

This paper investigates augmenting program repair prompts with execution traces for LLMs, finding that optimized prompts can improve repair performance, especially when combined with reasoning enhancements, though benefits diminish with trace complexity.

Contribution

It introduces prompt strategies incorporating execution traces for LLM-based program repair and demonstrates their effectiveness over traditional trace-free prompts.

Findings

01

Optimized prompts outperform trace-free prompts in certain configurations.

02

Effectiveness of execution traces decreases as their complexity increases.

03

Trace-based prompting can surpass finetuning smaller models on limited data.

Abstract

Large Language Models (LLMs) show promising performance on various programming tasks, including Automatic Program Repair (APR). However, most approaches to LLM-based APR are limited to the static analysis of the programs, while disregarding their runtime behavior. Inspired by knowledge-augmented NLP, in this work, we aim to remedy this potential blind spot by augmenting standard APR prompts with program execution traces. We evaluate our approach using the GPT family of models on three popular APR datasets. Our findings suggest that simply incorporating execution traces into the prompt provides a limited performance improvement over trace-free baselines, in only 2 out of 6 tested dataset / model configurations. We further find that the effectiveness of execution traces for APR diminishes as their complexity increases. We explore several strategies for leveraging traces in prompts and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · Linear Layer · Multi-Head Attention · Dense Connections · Discriminative Fine-Tuning · Adam