TL;DR
This study empirically evaluates BERT-based models, particularly RoBERTa variants, for code completion tasks at various levels, demonstrating their potential to improve developer support beyond token prediction.
Contribution
It provides a large-scale empirical analysis of BERT models for multi-level code completion, extending beyond traditional token prediction to entire code blocks.
Findings
BERT models achieve up to 58% perfect predictions for few tokens.
Models can predict entire code blocks with around 7% accuracy.
Results show BERT models are viable for advanced code completion tasks.
Abstract
Code completion is one of the main features of modern Integrated Development Environments (IDEs). Its objective is to speed up code writing by predicting the next code token(s) the developer is likely to write. Research in this area has substantially bolstered the predictive performance of these techniques. However, the support to developers is still limited to the prediction of the next few tokens to type. In this work, we take a step further in this direction by presenting a large-scale empirical study aimed at exploring the capabilities of state-of-the-art deep learning (DL) models in supporting code completion at different granularity levels, including single tokens, one or multiple entire statements, up to entire code blocks (e.g., the iterated block of a for loop). To this aim, we train and test several adapted variants of the recently proposed RoBERTa model, and evaluate its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
