CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy,, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li,, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou,, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu

TL;DR
CodeXGLUE is a comprehensive benchmark dataset designed to advance machine learning research in programming language understanding and generation, providing multiple tasks, datasets, and baseline models for evaluation.
Contribution
It introduces a unified platform with diverse datasets, tasks, and baseline models to facilitate and accelerate research in code understanding and generation.
Findings
Includes 10 tasks across 14 datasets
Provides baseline models like BERT, GPT, Encoder-Decoder
Enables standardized evaluation and comparison
Abstract
Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
