ReCode: Robustness Evaluation of Code Generation Models
Shiqi Wang, Zheng Li, Haifeng Qian, Chenghao Yang, Zijian Wang,, Mingyue Shang, Varun Kumar, Samson Tan, Baishakhi Ray, Parminder Bhatia,, Ramesh Nallapati, Murali Krishna Ramanathan, Dan Roth, Bing Xiang

TL;DR
ReCode introduces a comprehensive benchmark for evaluating the robustness of code generation models against various realistic perturbations, highlighting their vulnerabilities and guiding future improvements.
Contribution
This paper presents ReCode, the first extensive robustness evaluation benchmark for code generation models with tailored transformations and semantic-preserving perturbations.
Findings
CodeGen shows better robustness than InCoder and GPT-J.
Models are most sensitive to syntax perturbations.
Robustness evaluation is more challenging on MBPP than HumanEval.
Abstract
Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation. In this paper, we propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. They are carefully designed to be natural in real-life coding practice, preserve the original semantic meaning, and thus provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Advanced Malware Detection Techniques
MethodsCodeGen
