Self-Supervised Contrastive Learning for Code Retrieval and   Summarization via Semantic-Preserving Transformations

Nghi D. Q. Bui; Yijun Yu; Lingxiao Jiang

arXiv:2009.02731·cs.SE·May 25, 2021

Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations

Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang

PDF

TL;DR

Corder is a self-supervised contrastive learning framework that uses semantic-preserving transformations to pre-train source code models, significantly improving code retrieval and summarization without relying on labeled data.

Contribution

The paper introduces Corder, a novel self-supervised contrastive learning approach for source code that leverages semantic-preserving transformations, reducing the need for labeled data in code tasks.

Findings

01

Outperforms baselines in code retrieval and summarization tasks

02

Effective in low-resource or unlabeled data scenarios

03

Enhances code understanding through semantic-preserving transformations

Abstract

We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used in two ways: (1) it can produce vector representation of code which can be applied to code retrieval tasks that do not have labeled data; (2) it can be used in a fine-tuning process for tasks that might still require label data such as code summarization. The key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets through a contrastive learning objective. To do so, we use a set of semantic-preserving transformation operators to generate code snippets that are syntactically diverse but semantically equivalent. Through extensive experiments, we have shown that the code models pretrained by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning