SPENCER: Self-Adaptive Model Distillation for Efficient Code Retrieval

Wenchao Gu; Zongyi Lyu; Yanlin Wang; Hongyu Zhang; Cuiyun Gao; Michael R. Lyu

arXiv:2508.00546·cs.SE·August 4, 2025

SPENCER: Self-Adaptive Model Distillation for Efficient Code Retrieval

Wenchao Gu, Zongyi Lyu, Yanlin Wang, Hongyu Zhang, Cuiyun Gao, Michael R. Lyu

PDF

Open Access

TL;DR

SPENCER introduces a self-adaptive model distillation framework that combines dual-encoder and cross-encoder architectures to enhance code retrieval accuracy while significantly reducing inference time.

Contribution

The paper presents a novel self-adaptive model distillation method with a teaching assistant selection strategy to improve efficiency and performance in code retrieval models.

Findings

01

Achieves over 98% of the original model performance.

02

Reduces inference time of the dual-encoder by 70%.

03

Improves overall accuracy compared to solely dual-encoder models.

Abstract

Code retrieval aims to provide users with desired code snippets based on users' natural language queries. With the development of deep learning technologies, adopting pre-trained models for this task has become mainstream. Considering the retrieval efficiency, most of the previous approaches adopt a dual-encoder for this task, which encodes the description and code snippet into representation vectors, respectively. However, the model structure of the dual-encoder tends to limit the model's performance, since it lacks the interaction between the code snippet and description at the bottom layer of the model during training. To improve the model's effectiveness while preserving its efficiency, we propose a framework, which adopts Self-AdaPtive Model Distillation for Efficient CodE Retrieval, named SPENCER. SPENCER first adopts the dual-encoder to narrow the search space and then adopts the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Software Engineering Research