Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval
Huihui Huang, Ratnadira Widyasari, Ting Zhang, Ivana Clairine Irsan, Jieke Shi, Han Wei Ang, Frank Liauw, Eng Lieh Ouh, Lwin Khin Shar, Hong Jin Kang, David Lo

TL;DR
This paper introduces a realistic evaluation framework for issue-commit linking, revealing the limitations of current deep learning methods, and proposes EasyLink, a new approach that significantly improves link retrieval accuracy using LLMs and vector databases.
Contribution
The paper presents a realistic dataset and evaluation setting for issue-commit linking, and introduces EasyLink, a novel method combining vector retrieval and LLM-based reranking to enhance accuracy.
Findings
Deep learning approaches' performance drops by over 50% in realistic settings.
Traditional VSM outperforms deep learning methods under realistic evaluation.
EasyLink achieves an average Precision@1 of 75.03%, surpassing previous methods by over four times.
Abstract
Issue-commit linking, which connects issues with commits that fix them, is crucial for software maintenance. Existing approaches have shown promise in automatically recovering these links. Evaluations of these techniques assess their ability to identify genuine links from plausible but false links. However, these evaluations overlook the fact that, in reality, when a repository has more commits, the presence of more plausible yet unrelated commits may interfere with the tool in differentiating the correct fix commits. To address this, we propose the Realistic Distribution Setting (RDS) and use it to construct a more realistic evaluation dataset that includes 20 open-source projects. By evaluating tools on this dataset, we observe that the performance of the state-of-the-art deep learning-based approach drops by more than half, while the traditional Information Retrieval method, VSM,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Information Systems · Data Quality and Management · Natural Language Processing Techniques
