An Empirical Study on the Usage of Transformer Models for Code   Completion

Matteo Ciniselli; Nathan Cooper; Luca Pascarella; Antonio Mastropaolo,; Emad Aghajani; Denys Poshyvanyk; Massimiliano Di Penta; Gabriele Bavota

arXiv:2108.01585·cs.SE·November 19, 2021

An Empirical Study on the Usage of Transformer Models for Code Completion

Matteo Ciniselli, Nathan Cooper, Luca Pascarella, Antonio Mastropaolo,, Emad Aghajani, Denys Poshyvanyk, Massimiliano Di Penta, Gabriele Bavota

PDF

1 Repo

TL;DR

This study evaluates Transformer models like RoBERTa and T5 for code completion across various levels, revealing T5's effectiveness especially in predicting larger code segments, thus advancing understanding of their capabilities beyond token prediction.

Contribution

It provides a large-scale empirical analysis of Transformer models for code completion at multiple granularity levels, including entire statements and blocks, which was less explored before.

Findings

01

T5 achieves up to 69% accuracy in token prediction.

02

Transformer models can effectively support code completion at different granularities.

03

Perfect predictions reach around 29% for entire code blocks.

Abstract

Code completion aims at speeding up code writing by predicting the next code token(s) the developer is likely to write. Works in this field focused on improving the accuracy of the generated predictions, with substantial leaps forward made possible by deep learning (DL) models. However, code completion techniques are mostly evaluated in the scenario of predicting the next token to type, with few exceptions pushing the boundaries to the prediction of an entire code statement. Thus, little is known about the performance of state-of-the-art code completion approaches in more challenging scenarios in which, for example, an entire code block must be generated. We present a large-scale study exploring the capabilities of state-of-the-art Transformer-based models in supporting code completion at different granularity levels, including single tokens, one or multiple entire statements, up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mciniselli/T5_Replication_Package
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.