TL;DR
This study evaluates Transformer models like RoBERTa and T5 for code completion across various levels, revealing T5's effectiveness especially in predicting larger code segments, thus advancing understanding of their capabilities beyond token prediction.
Contribution
It provides a large-scale empirical analysis of Transformer models for code completion at multiple granularity levels, including entire statements and blocks, which was less explored before.
Findings
T5 achieves up to 69% accuracy in token prediction.
Transformer models can effectively support code completion at different granularities.
Perfect predictions reach around 29% for entire code blocks.
Abstract
Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made possible by deep learning (DL) models. However, code completion techniques are mostly evaluated in the scenario of predicting the next token to type, with few exceptions pushing the boundaries to the prediction of an entire code statement. Thus, little is known about the performance of state-of-the-art code completion approaches in more challenging scenarios in which, for example, an entire code block must be generated. We present a large-scale study exploring the capabilities of state-of-the-art Transformer-based models in supporting code completion at different granularity levels, including single tokens, one or multiple entire statements, up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
