Gotta catch 'em all! Towards File Localisation from Issues at Large
Jesse Maarleveld, Jiapan Guo, Daniel Feitosa

TL;DR
This paper explores file localisation from all types of issues, not just bugs, by creating a dataset and evaluating traditional information retrieval methods, revealing the need for general-purpose models and project-specific tuning.
Contribution
It introduces a data pipeline for issue file localisation datasets, evaluates baseline IR methods on diverse issues, and analyzes biases affecting localisation performance.
Findings
Traditional bug-specific heuristics perform poorly on general issues.
Small but significant differences exist between issue types.
Identifiers have a minor impact on localisation performance.
Abstract
Bug localisation, the study of developing methods to localise the files requiring changes to resolve bugs, has been researched for a long time to develop methods capable of saving developers' time. Recently, researchers are starting to consider issues outside of bugs. Nevertheless, most existing research into file localisation from issues focusses on bugs or uses other selection methods to ensure only certain types of issues are considered as part of the focus of the work. Our goal is to work on all issues at large, without any specific selection. In this work, we provide a data pipeline for the creation of issue file localisation datasets, capable of dealing with arbitrary branching and merging practices. We provide a baseline performance evaluation for the file localisation problem using traditional information retrieval approaches. Finally, we use statistical analysis to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
