code2vec: Learning Distributed Representations of Code

Uri Alon; Meital Zilberstein; Omer Levy; Eran Yahav

arXiv:1803.09473·cs.LG·October 31, 2018

code2vec: Learning Distributed Representations of Code

Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav

PDF

5 Repos

TL;DR

This paper introduces code2vec, a neural model that creates fixed-length vector representations of code snippets by decomposing them into syntax tree paths, enabling semantic property prediction and method name inference across large datasets.

Contribution

The paper presents a novel neural approach for generating code embeddings from syntax tree paths, significantly improving method name prediction accuracy over previous techniques.

Findings

01

Achieved over 75% relative improvement in method name prediction.

02

Successfully predicted method names on completely unseen files.

03

Learned semantic-rich method name vectors capturing similarities and analogies.

Abstract

We present a neural model for representing snippets of code as continuous distributed vectors ("code embeddings"). The main idea is to represent a code snippet as a single fixed-length $code vector$ , which can be used to predict semantic properties of the snippet. This is performed by decomposing code to a collection of paths in its abstract syntax tree, and learning the atomic representation of each path $simultaneously$ with learning how to aggregate a set of them. We demonstrate the effectiveness of our approach by using it to predict a method's name from the vector representation of its body. We evaluate our approach by training a model on a dataset of 14M methods. We show that code vectors trained on this dataset can predict method names from files that were completely unobserved during training. Furthermore, we show that our model learns useful method name…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.