MGS3: A Multi-Granularity Self-Supervised Code Search Framework
Rui Li, Junfeng Kang, Qi Liu, Liyang He, Zheng Zhang, Yunhao Sha, Linbo Zhu, Zhenya Huang

TL;DR
This paper introduces MGS$^{3}$, a multi-granularity self-supervised framework for code search that leverages hierarchical representations and contrastive learning across different code granularities, improving retrieval accuracy.
Contribution
The paper presents a novel multi-granularity self-supervised contrastive learning framework and a large dataset, addressing the gap in fine-grained code search performance.
Findings
Outperforms existing methods on multiple code search benchmarks.
Demonstrates effectiveness across various granularities and model architectures.
Shows compatibility with pre-trained code representation models.
Abstract
In the pursuit of enhancing software reusability and developer productivity, code search has emerged as a key area, aimed at retrieving code snippets relevant to functionalities based on natural language queries. Despite significant progress in self-supervised code pre-training utilizing the vast amount of code data in repositories, existing methods have primarily focused on leveraging contrastive learning to align natural language with function-level code snippets. These studies have overlooked the abundance of fine-grained (such as block-level and statement-level) code snippets prevalent within the function-level code snippets, which results in suboptimal performance across all levels of granularity. To address this problem, we first construct a multi-granularity code search dataset called MGCodeSearchNet, which contains 536K+ pairs of natural language and code snippets. Subsequently,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsContrastive Learning · ALIGN
