TL;DR
CodeFill is a novel code autocompletion model that jointly learns from code structure and naming sequences, enabling more accurate multi-token and single-token predictions by capturing long-range dependencies.
Contribution
The paper introduces CodeFill, a multi-task Transformer model that combines structure and naming information for improved code autocompletion, trained on large datasets and evaluated with realistic benchmarks.
Findings
Outperforms baselines in single-token prediction (MRR: 70.9%)
Achieves state-of-the-art multi-token prediction (ROUGE-L: 63.7%)
Effectively captures long-range dependencies in code
Abstract
Code completion is an essential feature of IDEs, yet current autocompleters are restricted to either grammar-based or NLP-based single token completions. Both approaches have significant drawbacks: grammar-based autocompletion is restricted in dynamically-typed language environments, whereas NLP-based autocompleters struggle to understand the semantics of the programming language and the developer's code context. In this work, we present CodeFill, a language model for autocompletion that combines learned structure and naming information. Using a parallel Transformer architecture and multi-task learning, CodeFill consumes sequences of source code token names and their equivalent AST token types. Uniquely, CodeFill is trained both for single-token and multi-token (statement) prediction, which enables it to learn long-range dependencies among grammatical and naming elements. We train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Softmax · Layer Normalization · Multi-Head Attention · Dense Connections · Byte Pair Encoding · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer
