Structured Generative Models of Natural Source Code
Chris J. Maddison, Daniel Tarlow

TL;DR
This paper introduces a family of structured generative models for natural source code that incorporate hierarchical and sequential information, learn distributed representations, and integrate compiler logic to improve code generation accuracy.
Contribution
It presents novel generative models for source code that combine hierarchical structure, distributed representations, and compiler integration, with an extension for variable scope handling.
Findings
Models significantly improve code generation probability.
Incorporating structure enhances model performance.
Compiler integration provides better structural understanding.
Abstract
We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have three key properties: First, they incorporate both sequential and hierarchical structure. Second, we learn a distributed representation of source code elements. Finally, they integrate closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the models, measured by the probability of generating test programs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
