Code Prediction by Feeding Trees to Transformers

Seohyun Kim; Jinman Zhao; Yuchi Tian; Satish Chandra

arXiv:2003.13848·cs.SE·June 2, 2023·32 cites

Code Prediction by Feeding Trees to Transformers

Seohyun Kim, Jinman Zhao, Yuchi Tian, Satish Chandra

PDF

Open Access 1 Repo

TL;DR

This paper improves code prediction accuracy in autocomplete systems by applying Transformer architectures and incorporating syntactic code structure, achieving significant performance gains over previous models.

Contribution

It introduces methods to make Transformers syntactically aware for code prediction, surpassing prior neural and non-neural systems in accuracy.

Findings

01

Transformer outperforms previous models in code prediction accuracy

02

Syntactic awareness further improves Transformer performance

03

Achieves 18.3% better accuracy than RNN-based systems

Abstract

We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3%, the Deep3 system (Raychev et al 2016) by 14.1%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/code-prediction-transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques