Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Milan Cvitkovic; Badal Singh; Anima Anandkumar

arXiv:1810.08305·cs.LG·May 21, 2019·21 cites

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Milan Cvitkovic, Badal Singh, Anima Anandkumar

PDF

Open Access 3 Repos

TL;DR

This paper introduces a graph-structured cache to handle open vocabulary challenges in source code modeling, significantly improving performance on code completion and variable naming tasks using graph neural networks.

Contribution

It proposes a novel graph-structured cache mechanism that enhances neural models' ability to handle open vocabulary in source code, improving accuracy in code understanding tasks.

Findings

01

Over 100% improvement in variable naming accuracy

02

Enhanced performance on code completion tasks

03

Moderate increase in computation time

Abstract

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over $100%$ relative improvement on the latter --- at the cost of a moderate increase in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Machine Learning and Data Classification · Software Engineering Research