FixEval: Execution-based Evaluation of Program Fixes for Programming Problems
Md Mahim Anjum Haque, Wasi Uddin Ahmad, Ismini Lourentzou and, Chris Brown

TL;DR
FixEval introduces a benchmark with buggy code and fixes, emphasizing execution-based evaluation over match-based metrics to better assess the correctness of automatically generated program fixes.
Contribution
The paper presents FixEval, a new benchmark dataset with execution-based evaluation metrics for assessing model-generated code fixes in programming problems.
Findings
Execution-based metrics outperform match-based metrics in accuracy.
Transformer models show limited success in automatic bug fixing.
FixEval facilitates more realistic evaluation of program repair models.
Abstract
The complexity of modern software has led to a drastic increase in the time and cost associated with detecting and rectifying software bugs. In response, researchers have explored various methods to automatically generate fixes for buggy code. However, due to the large combinatorial space of possible fixes for any given bug, few tools and datasets are available to evaluate model-generated fixes effectively. To address this issue, we introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their corresponding fixes. FixEval offers an extensive collection of unit tests to evaluate the correctness of model-generated program fixes and assess further information regarding time, memory constraints, and acceptance based on a verdict. We consider two Transformer language models pretrained on programming languages as our baseline and compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Absolute Position Encodings · Softmax · Adam · Position-Wise Feed-Forward Layer · Dropout · Residual Connection
