Structural Language Models of Code

Uri Alon; Roy Sadaka; Omer Levy; Eran Yahav

arXiv:1910.00577·cs.LG·July 30, 2020·44 cites

Structural Language Models of Code

Uri Alon, Roy Sadaka, Omer Levy, Eran Yahav

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a structural language modeling approach for code completion that models code as an abstract syntax tree, enabling generation of arbitrary code in any programming language with improved accuracy.

Contribution

It presents a neural model leveraging AST structure for code generation, outperforming previous seq2seq and structured methods in multiple languages.

Findings

01

Outperforms seq2seq models in code generation tasks.

02

Can generate arbitrary code in any programming language.

03

Significantly improves code completion accuracy.

Abstract

We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program's abstract syntax tree (AST) by decomposing it into a product of conditional probabilities over its nodes. We present a neural model that computes these conditional probabilities by considering all AST paths leading to a target node. Unlike previous techniques that have severely restricted the kinds of expressions that can be generated in this task, our approach can generate arbitrary code in any programming language. Our model significantly outperforms both seq2seq and a variety of structured…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Structural Language Models of Code· slideslive

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence