Too Few Bug Reports? Exploring Data Augmentation for Improved Changeset-based Bug Localization
Agnieszka Ciborowska, Kostadin Damevski

TL;DR
This paper investigates the use of data augmentation techniques to generate synthetic training data for transformer-based deep learning models in bug localization, aiming to improve performance with limited real bug reports.
Contribution
It introduces novel data augmentation operators and a data balancing strategy to enhance bug localization models trained on scarce bug report data.
Findings
Synthetic data improves bug localization accuracy
Data augmentation helps models generalize better across code bases
Proposed methods outperform baseline models without augmentation
Abstract
Modern Deep Learning (DL) architectures based on transformers (e.g., BERT, RoBERTa) are exhibiting performance improvements across a number of natural language tasks. While such DL models have shown tremendous potential for use in software engineering applications, they are often hampered by insufficient training data. Particularly constrained are applications that require project-specific data, such as bug localization, which aims at recommending code to fix a newly submitted bug report. Deep learning models for bug localization require a substantial training set of fixed bug reports, which are at a limited quantity even in popular and actively developed software projects. In this paper, we examine the effect of using synthetic training data on transformer-based DL models that perform a more complex variant of bug localization, which has the goal of retrieving bug-inducing changesets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Testing and Debugging Techniques
