Code Prediction by Feeding Trees to Transformers
Seohyun Kim, Jinman Zhao, Yuchi Tian, Satish Chandra

TL;DR
This paper improves code prediction accuracy in autocomplete systems by applying Transformer architectures and incorporating syntactic code structure, achieving significant performance gains over previous models.
Contribution
It introduces methods to make Transformers syntactically aware for code prediction, surpassing prior neural and non-neural systems in accuracy.
Findings
Transformer outperforms previous models in code prediction accuracy
Syntactic awareness further improves Transformer performance
Achieves 18.3% better accuracy than RNN-based systems
Abstract
We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3%, the Deep3 system (Raychev et al 2016) by 14.1%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
