Revisiting Code Search in a Two-Stage Paradigm

Fan Hu; Yanlin Wang; Lun Du; Xirong Li; Hongyu Zhang; Shi Han; Dongmei; Zhang

arXiv:2208.11274·cs.SE·March 29, 2024

Revisiting Code Search in a Two-Stage Paradigm

Fan Hu, Yanlin Wang, Lun Du, Xirong Li, Hongyu Zhang, Shi Han, Dongmei, Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TOSS, a two-stage code search framework that combines IR, bi-encoder, and cross-encoder models to improve accuracy and efficiency in code retrieval tasks across multiple languages.

Contribution

TOSS effectively fuses different code search methods in a two-stage process, achieving state-of-the-art accuracy with improved efficiency over existing approaches.

Findings

01

TOSS outperforms baseline models with an MRR of 0.763.

02

TOSS is efficient and effective across multiple programming languages.

03

Compared to six data fusion methods, TOSS achieves superior results.

Abstract

With a good code search engine, developers can reuse existing code snippets and accelerate software development process. Current code search methods can be divided into two categories: traditional information retrieval (IR) based and deep learning (DL) based approaches. DL-based approaches include the cross-encoder paradigm and the bi-encoder paradigm. However, both approaches have certain limitations. The inference of IR-based and bi-encoder models are fast, however, they are not accurate enough; while cross-encoder models can achieve higher search accuracy but consume more time. In this work, we propose TOSS, a two-stage fusion code search framework that can combine the advantages of different code search methods. TOSS first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates, and then uses fine-grained cross-encoders for finer ranking.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fly-dragon211/TOSS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling