Fast Changeset-based Bug Localization with BERT
Agnieszka Ciborowska, Kostadin Damevski

TL;DR
This paper presents a fast BERT-based approach for bug localization that effectively matches bug reports to code changes despite lexical gaps, improving accuracy and response time in software engineering tasks.
Contribution
The authors develop a computationally efficient BERT-based model tailored for changeset-based bug localization, addressing the challenge of lexical gaps and response time constraints.
Findings
Proposed BERT model outperforms non-contextual baselines.
Model shows improved accuracy for bug reports with minimal hints.
Enhanced speed makes BERT applicable in real-time bug localization.
Abstract
Automatically localizing software bugs to the changesets that induced them has the potential to improve software developer efficiency and to positively affect software quality. To facilitate this automation, a bug report has to be effectively matched with source code changes, even when a significant lexical gap exists between natural language used to describe the bug and identifier naming practices used by developers. To bridge this gap, we need techniques that are able to capture software engineering-specific and project-specific semantics in order to detect relatedness between the two types of documents that goes beyond exact term matching. Popular transformer-based deep learning architectures, such as BERT, excel at leveraging contextual information, hence appear to be a suitable candidate for the task. However, BERT-like models are computationally expensive, which precludes them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Data Quality and Management
