TL;DR
This paper introduces a new repository-level code retrieval benchmark and a dual-encoder model, ReflectCode, that significantly improves retrieval accuracy by understanding cross-component change intents in complex codebases.
Contribution
The paper presents RepoAlign-Bench, a novel benchmark for repository-level code retrieval, and ReflectCode, an adversarial reflection-based dual-encoder architecture for enhanced context-aware code retrieval.
Findings
ReflectCode improves Top-5 Accuracy by 12.2%.
ReflectCode enhances Recall by 7.1%.
RepoAlign-Bench provides 52k annotated instances for repository-level retrieval.
Abstract
The escalating complexity of modern codebases has intensified the need for retrieval systems capable of interpreting cross-component change intents, a capability fundamentally absent in conventional function-level search paradigms. While recent studies have improved the alignment between natural language queries and code snippets, retrieving contextually relevant code for specific change requests remains largely underexplored. To address this gap, we introduce RepoAlign-Bench, the first benchmark specifically designed to evaluate repository-level code retrieval under change request driven scenarios, encompassing 52k annotated instances. This benchmark shifts the retrieval paradigm from function-centric matching to holistic repository-level reasoning. Furthermore, we propose ReflectCode, an adversarial reflection augmented dual-tower architecture featuring disentangled code_encoder and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
