Think Harder and Don't Overlook Your Options: Revisiting Issue-Commit Linking with LLM-Assisted Retrieval
Cole Morgan, Muhammad Asaduzzaman, Shaiful Chowdhurry, Shaowei Wang

TL;DR
This paper evaluates various retrieval and reranking techniques, including LLMs, for linking issue reports to commits, finding dense retrieval methods and traditional models most effective.
Contribution
It provides a comprehensive comparison of established and modern retrieval and reranking techniques, highlighting the practicality of dense retrieval and traditional models over LLMs.
Findings
Dense retrieval outperforms sparse methods in relevant commit identification.
Combining dense and sparse retrieval improves recall.
Traditional machine learning rerankers outperform LLM-based approaches.
Abstract
Linking issue reports to the commits that resolve them is essential for software traceability, maintenance, and evolution. Accurate issue-commit links help developers to understand system changes and the rationale behind them. While numerous automated techniques have been proposed, ranging from heuristic and feature-based approaches to modern deep learning and large language model approaches, our goal is to evaluate these techniques to determine which are most effective and efficient. In this study, we revisit several established issue-commit link recovery techniques, including BTLink, EasyLink, FRLink, RCLinker, and Hybrid-Linker, and assess their performance for reranking issue-commit links. We first evaluate different retrieval methods (BM25, BM25L, SBERT-Semantic Search, ANNOY, LSH, HNSW) for their ability to efficiently retrieve relevant commits, reducing the candidate set that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
