CodeRetriever: Unimodal and Bimodal Contrastive Learning for Code Search
Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun, Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, Nan Duan

TL;DR
CodeRetriever introduces a novel contrastive learning framework for code search, utilizing both unimodal and bimodal schemes to learn semantic representations from large-scale code and text data, achieving state-of-the-art results.
Contribution
The paper presents a new pre-training model that combines unimodal and bimodal contrastive learning for improved code semantic representation and search performance.
Findings
Achieves state-of-the-art results on eleven code search tasks.
Effective across six programming languages and various code granularities.
Significantly outperforms existing code pre-trained models.
Abstract
In this paper, we propose the CodeRetriever model, which learns the function-level code semantic representations through large-scale code-text contrastive pre-training. We adopt two contrastive learning schemes in CodeRetriever: unimodal contrastive learning and bimodal contrastive learning. For unimodal contrastive learning, we design an unsupervised learning approach to build semantic-related code pairs based on the documentation and function name. For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build code-text pairs. Both contrastive objectives can fully leverage large-scale code corpus for pre-training. Extensive experimental results show that CodeRetriever achieves new state-of-the-art with significant improvement over existing code pre-trained models, on eleven domain/language-specific code search tasks with six programming languages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsContrastive Learning
