SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing
Wenchao Gu, Ensheng Shi, Yanlin Wang, Lun Du, Shi Han, Hongyu Zhang,, Dongmei Zhang, Michael R. Lyu

TL;DR
SECRET introduces a segmented deep hashing method that significantly accelerates large-scale code retrieval by converting long hash codes into shorter segments, reducing retrieval time by over 95% while maintaining high accuracy.
Contribution
The paper proposes SECRET, a novel segmented deep hashing approach that enhances scalability and efficiency in code retrieval by using multiple hash code segments for faster lookup.
Findings
Reduces retrieval time by at least 95%.
Achieves comparable or higher retrieval performance.
Outperforms classical LSH in efficiency and accuracy.
Abstract
Code retrieval, which retrieves code snippets based on users' natural language descriptions, is widely used by developers and plays a pivotal role in real-world software development. The advent of deep learning has shifted the retrieval paradigm from lexical-based matching towards leveraging deep learning models to encode source code and queries into vector representations, facilitating code retrieval according to vector similarity. Despite the effectiveness of these models, managing large-scale code database presents significant challenges. Previous research proposes deep hashing-based methods, which generate hash codes for queries and code snippets and use Hamming distance for rapid recall of code candidates. However, this approach's reliance on linear scanning of the entire code base limits its scalability. To further improve the efficiency of large-scale code retrieval, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
