An Empirical Study on the Usage of BERT Models for Code Completion

Matteo Ciniselli; Nathan Cooper; Luca Pascarella; Denys Poshyvanyk,; Massimiliano Di Penta; Gabriele Bavota

arXiv:2103.07115·cs.SE·March 15, 2021

An Empirical Study on the Usage of BERT Models for Code Completion

Matteo Ciniselli, Nathan Cooper, Luca Pascarella, Denys Poshyvanyk,, Massimiliano Di Penta, Gabriele Bavota

PDF

2 Repos

TL;DR

This study empirically evaluates BERT-based models, particularly RoBERTa variants, for code completion tasks at various levels, demonstrating their potential to improve developer support beyond token prediction.

Contribution

It provides a large-scale empirical analysis of BERT models for multi-level code completion, extending beyond traditional token prediction to entire code blocks.

Findings

01

BERT models achieve up to 58% perfect predictions for few tokens.

02

Models can predict entire code blocks with around 7% accuracy.

03

Results show BERT models are viable for advanced code completion tasks.

Abstract

Code completion is one of the main features of modern Integrated Development Environments (IDEs). Its objective is to speed up code writing by predicting the next code token(s) the developer is likely to write. Research in this area has substantially bolstered the predictive performance of these techniques. However, the support to developers is still limited to the prediction of the next few tokens to type. In this work, we take a step further in this direction by presenting a large-scale empirical study aimed at exploring the capabilities of state-of-the-art deep learning (DL) models in supporting code completion at different granularity levels, including single tokens, one or multiple entire statements, up to entire code blocks (e.g., the iterated block of a for loop). To this aim, we train and test several adapted variants of the recently proposed RoBERTa model, and evaluate its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.