Universal Representation for Code
Linfeng Liu, Hoan Nguyen, George Karypis, Srinivasan Sengamedu

TL;DR
This paper introduces a novel graph-based pre-training approach for code representations that captures semantic relationships, enabling transferability across multiple code-related tasks with state-of-the-art performance.
Contribution
The work presents a universal code representation framework using graph neural networks and pre-training strategies, improving transferability and performance on various code understanding tasks.
Findings
Achieves state-of-the-art results on method name prediction.
Outperforms existing methods on code graph link prediction.
Visualizations reveal meaningful semantic properties in the representations.
Abstract
Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets -- spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Advanced Malware Detection Techniques
