Universal Representation for Code

Linfeng Liu; Hoan Nguyen; George Karypis; Srinivasan Sengamedu

arXiv:2103.03116·cs.LG·March 5, 2021

Universal Representation for Code

Linfeng Liu, Hoan Nguyen, George Karypis, Srinivasan Sengamedu

PDF

Open Access

TL;DR

This paper introduces a novel graph-based pre-training approach for code representations that captures semantic relationships, enabling transferability across multiple code-related tasks with state-of-the-art performance.

Contribution

The work presents a universal code representation framework using graph neural networks and pre-training strategies, improving transferability and performance on various code understanding tasks.

Findings

01

Achieves state-of-the-art results on method name prediction.

02

Outperforms existing methods on code graph link prediction.

03

Visualizations reveal meaningful semantic properties in the representations.

Abstract

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets -- spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Advanced Malware Detection Techniques