Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study
Tim van Dam, Maliheh Izadi, Arie van Deursen

TL;DR
This empirical study investigates whether adding contextual data like type annotations and comments improves the performance of transformer-based code completion models, finding that removing type annotations can sometimes enhance model accuracy.
Contribution
The paper provides an empirical evaluation of the impact of contextual data on code completion models, highlighting counterintuitive results and offering guidance for model training and selection.
Findings
Models perform better without type annotations.
Multi-line comments improve model performance.
Small effect sizes indicate modest impact.
Abstract
Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the code-understanding abilities of such models, the opposite -- making the code easier to understand -- has not been properly investigated. In this study, we aim to answer whether making code easier to understand through using contextual data improves the performance of pre-trained code language models for the task of code completion. We consider type annotations and comments as two common forms of additional contextual information that often help developers understand code better. For the experiments, we study code completion in two granularity levels; token and line completion and take three recent and large-scale language models for source code:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Engineering Techniques and Practices
MethodsTest
