A Controlled Experiment of Different Code Representations for Learning-Based Bug Repair
Marjane Namavar, Noor Nashid, Ali Mesbah

TL;DR
This study systematically evaluates how different code representations affect deep learning models for bug repair, revealing that mixed representations often outperform homogeneous ones and that bug type influences optimal representation choices.
Contribution
It provides a comprehensive controlled experiment analyzing the impact of various code representations on model accuracy and usefulness in automated bug fixing.
Findings
Mixed representations can outperform homogeneous ones.
Code abstraction may reduce the practical usefulness of fixes.
Bug type influences the effectiveness of code representations.
Abstract
Training a deep learning model on source code has gained significant traction recently. Since such models reason about vectors of numbers, source code needs to be converted to a code representation before vectorization. Numerous approaches have been proposed to represent source code, from sequences of tokens to abstract syntax trees. However, there is no systematic study to understand the effect of code representation on learning performance. Through a controlled experiment, we examine the impact of various code representations on model accuracy and usefulness in deep learning-based program repair. We train 21 different generative models that suggest fixes for name-based bugs, including 14 different homogeneous code representations, four mixed representations for the buggy and fixed code, and three different embeddings. We assess if fix suggestions produced by the model in various code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
