Improving Automated Program Repair with Domain Adaptation
Armin Zirak, Hadi Hemmati

TL;DR
This paper addresses the domain shift challenge in Automated Program Repair by proposing a domain adaptation framework and synthetic data generation, significantly improving model effectiveness across diverse projects.
Contribution
It introduces a novel domain adaptation framework and a synthetic bug data generation method to enhance APR models' generalizability to new projects.
Findings
Domain adaptation improves APR effectiveness by up to 23.4%.
Synthetic data generation enhances zero-shot learning performance.
Framework benefits state-of-the-art APR tools on multiple projects.
Abstract
Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques. APR tools such as TFix and CodeXGLUE combine text-to-text transformers with software-specific techniques are outperforming alternatives, these days. However, in most APR studies the train and test sets are chosen from the same set of projects. In reality, however, APR models are meant to be generalizable to new and different projects. Therefore, there is a potential threat that reported APR models with high effectiveness perform poorly when the characteristics of the new project or its bugs are different than the training set's(Domain Shift). In this study, we first define and measure the domain shift problem in automated program repair.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
MethodsRepair · Test
