FixEval: Execution-based Evaluation of Program Fixes for Programming   Problems

Md Mahim Anjum Haque; Wasi Uddin Ahmad; Ismini Lourentzou and; Chris Brown

arXiv:2206.07796·cs.SE·March 31, 2023·5 cites

FixEval: Execution-based Evaluation of Program Fixes for Programming Problems

Md Mahim Anjum Haque, Wasi Uddin Ahmad, Ismini Lourentzou and, Chris Brown

PDF

Open Access 1 Repo

TL;DR

FixEval introduces a benchmark with buggy code and fixes, emphasizing execution-based evaluation over match-based metrics to better assess the correctness of automatically generated program fixes.

Contribution

The paper presents FixEval, a new benchmark dataset with execution-based evaluation metrics for assessing model-generated code fixes in programming problems.

Findings

01

Execution-based metrics outperform match-based metrics in accuracy.

02

Transformer models show limited success in automatic bug fixing.

03

FixEval facilitates more realistic evaluation of program repair models.

Abstract

The complexity of modern software has led to a drastic increase in the time and cost associated with detecting and rectifying software bugs. In response, researchers have explored various methods to automatically generate fixes for buggy code. However, due to the large combinatorial space of possible fixes for any given bug, few tools and datasets are available to evaluate model-generated fixes effectively. To address this issue, we introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their corresponding fixes. FixEval offers an extensive collection of unit tests to evaluate the correctness of model-generated program fixes and assess further information regarding time, memory constraints, and acceptance based on a verdict. We consider two Transformer language models pretrained on programming languages as our baseline and compare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mahimanzum/fixeval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Absolute Position Encodings · Softmax · Adam · Position-Wise Feed-Forward Layer · Dropout · Residual Connection