Structured Generative Models of Natural Source Code

Chris J. Maddison; Daniel Tarlow

arXiv:1401.0514·cs.PL·June 23, 2014·94 cites

Structured Generative Models of Natural Source Code

Chris J. Maddison, Daniel Tarlow

PDF

Open Access

TL;DR

This paper introduces a family of structured generative models for natural source code that incorporate hierarchical and sequential information, learn distributed representations, and integrate compiler logic to improve code generation accuracy.

Contribution

It presents novel generative models for source code that combine hierarchical structure, distributed representations, and compiler integration, with an extension for variable scope handling.

Findings

01

Models significantly improve code generation probability.

02

Incorporating structure enhances model performance.

03

Compiler integration provides better structural understanding.

Abstract

We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have three key properties: First, they incorporate both sequential and hierarchical structure. Second, we learn a distributed representation of source code elements. Finally, they integrate closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the models, measured by the probability of generating test programs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques