Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification

Aofan Liu; Shiyuan Song; Haoxuan Li; Cehao Yang; Yiyan Qi

arXiv:2510.24749·cs.SE·October 30, 2025

Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification

Aofan Liu, Shiyuan Song, Haoxuan Li, Cehao Yang, Yiyan Qi

PDF

1 Video

TL;DR

This paper introduces a new repository-level code retrieval benchmark and a dual-encoder model, ReflectCode, that significantly improves retrieval accuracy by understanding cross-component change intents in complex codebases.

Contribution

The paper presents RepoAlign-Bench, a novel benchmark for repository-level code retrieval, and ReflectCode, an adversarial reflection-based dual-encoder architecture for enhanced context-aware code retrieval.

Findings

01

ReflectCode improves Top-5 Accuracy by 12.2%.

02

ReflectCode enhances Recall by 7.1%.

03

RepoAlign-Bench provides 52k annotated instances for repository-level retrieval.

Abstract

The escalating complexity of modern codebases has intensified the need for retrieval systems capable of interpreting cross-component change intents, a capability fundamentally absent in conventional function-level search paradigms. While recent studies have improved the alignment between natural language queries and code snippets, retrieving contextually relevant code for specific change requests remains largely underexplored. To address this gap, we introduce RepoAlign-Bench, the first benchmark specifically designed to evaluate repository-level code retrieval under change request driven scenarios, encompassing 52k annotated instances. This benchmark shifts the retrieval paradigm from function-centric matching to holistic repository-level reasoning. Furthermore, we propose ReflectCode, an adversarial reflection augmented dual-tower architecture featuring disentangled code_encoder and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification· underline