Gotta catch 'em all! Towards File Localisation from Issues at Large

Jesse Maarleveld; Jiapan Guo; Daniel Feitosa

arXiv:2507.18319·cs.SE·July 25, 2025

Gotta catch 'em all! Towards File Localisation from Issues at Large

Jesse Maarleveld, Jiapan Guo, Daniel Feitosa

PDF

Open Access

TL;DR

This paper explores file localisation from all types of issues, not just bugs, by creating a dataset and evaluating traditional information retrieval methods, revealing the need for general-purpose models and project-specific tuning.

Contribution

It introduces a data pipeline for issue file localisation datasets, evaluates baseline IR methods on diverse issues, and analyzes biases affecting localisation performance.

Findings

01

Traditional bug-specific heuristics perform poorly on general issues.

02

Small but significant differences exist between issue types.

03

Identifiers have a minor impact on localisation performance.

Abstract

Bug localisation, the study of developing methods to localise the files requiring changes to resolve bugs, has been researched for a long time to develop methods capable of saving developers' time. Recently, researchers are starting to consider issues outside of bugs. Nevertheless, most existing research into file localisation from issues focusses on bugs or uses other selection methods to ensure only certain types of issues are considered as part of the focus of the work. Our goal is to work on all issues at large, without any specific selection. In this work, we provide a data pipeline for the creation of issue file localisation datasets, capable of dealing with arbitrary branching and merging practices. We provide a baseline performance evaluation for the file localisation problem using traditional information retrieval approaches. Finally, we use statistical analysis to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management