Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Chen Wu; Ming Yan

arXiv:2201.11313·cs.CL·January 28, 2022·6 cites

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Chen Wu, Ming Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel deep semantic model for code search that effectively bridges natural language and code semantics, achieving top performance on the CodeSearchNet benchmark.

Contribution

The paper proposes a new deep semantic model utilizing multi-modal sources, self-attention, and combined representations, advancing neural code search techniques.

Findings

01

Achieved 0.384 NDCG on CodeSearchNet benchmark

02

Won first place in the CodeSearchNet challenge

03

Enhanced representation learning through multi-modal and cross-lingual alignment

Abstract

Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and natural language, for better describing intrinsic concepts and semantics. Recently, deep neural network for code search has been a hot research topic. Typical methods for neural code search first represent the code snippet and query text as separate embeddings, and then use vector distance (e.g. dot-product or cosine) to calculate the semantic similarity between them. There exist many different ways for aggregating the variable length of code or query tokens into a learnable embedding, including bi-encoder, cross-encoder, and poly-encoder. The goal of the query encoder and code encoder is to produce embeddings that are close with each other for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

overwindows/semanticcodesearch
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Web Data Mining and Analysis · Software Testing and Debugging Techniques