TL;DR
SweRank is a new retrieve-and-rerank framework for software issue localization that outperforms existing models and costly LLM-based approaches, using a large-scale dataset called SweLoc.
Contribution
The paper introduces SweRank, an efficient framework for issue localization, and SweLoc, a large dataset for training and evaluating such models, achieving state-of-the-art results.
Findings
SweRank outperforms prior ranking models and LLM-based systems on benchmark datasets.
SweLoc enables effective training of issue localization models with real-world data.
SweRank improves the utility of existing retriever and reranker models for issue localization.
Abstract
Software issue localization, the task of identifying the precise code locations (files, classes, or functions) relevant to a natural language issue description (e.g., bug report, feature request), is a critical yet time-consuming aspect of software development. While recent LLM-based agentic approaches demonstrate promise, they often incur significant latency and cost due to complex multi-step reasoning and relying on closed-source LLMs. Alternatively, traditional code ranking models, typically optimized for query-to-code or code-to-code retrieval, struggle with the verbose and failure-descriptive nature of issue localization queries. To bridge this gap, we introduce SweRank, an efficient and effective retrieve-and-rerank framework for software issue localization. To facilitate training, we construct SweLoc, a large-scale dataset curated from public GitHub repositories, featuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
