Learning Python Code Suggestion with a Sparse Pointer Network
Avishkar Bhoopchand, Tim Rockt\"aschel, Earl Barr, Sebastian Riedel

TL;DR
This paper presents a neural language model with a sparse pointer network that significantly improves Python code suggestion accuracy, especially for long-range identifier references, by capturing dependencies over many tokens.
Contribution
The paper introduces a novel sparse pointer network architecture for neural language models, enhancing long-range dependency modeling in code suggestion systems for dynamic languages.
Findings
Achieved a 5% increase in code suggestion accuracy over baseline models.
Reduced perplexity significantly with the new model.
Predicted identifiers with 13 times higher accuracy for distant references.
Abstract
To enhance developer productivity, all modern integrated development environments (IDEs) include code suggestion functionality that proposes likely next tokens at the cursor. While current IDEs work well for statically-typed languages, their reliance on type annotations means that they do not provide the same level of support for dynamic programming languages as for statically-typed languages. Moreover, suggestion engines in modern IDEs do not propose expressions or multi-statement idiomatic code. Recent work has shown that language models can improve code suggestion systems by learning from software repositories. This paper introduces a neural language model with a sparse pointer network aimed at capturing very long-range dependencies. We release a large-scale code suggestion corpus of 41M lines of Python code crawled from GitHub. On this corpus, we found standard neural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · [LivE@PeRson]How do I talk to a real person at Expedia? · Softmax · Pointer Network · Long Short-Term Memory
