COCO: Testing Code Generation Systems via Concretized Instructions
Ming Yan, Junjie Chen, Jie M. Zhang, Xuejie Cao, Chen Yang, Mark, Harman

TL;DR
This paper introduces COCO, a novel testing technique that enhances the robustness evaluation of code generation systems by using concretized instructions, leading to improved detection of robustness issues and aiding in reducing inconsistencies.
Contribution
COCO is a new method that makes instructions more concrete to better test and improve the robustness of code generation systems, outperforming existing testing techniques.
Findings
COCO outperforms existing techniques by over 466%.
Concretized instructions help reduce robustness inconsistencies by up to 53.91%.
Effective on commercial tools like Copilot and ChatGPT.
Abstract
Code generation systems have been extensively developed in recent years to generate source code based on natural language instructions. However, despite their advancements, these systems still face robustness issues where even slightly different instructions can result in significantly different code semantics. Robustness is critical for code generation systems, as it can have significant impacts on software development, software quality, and trust in the generated code. Although existing testing techniques for general text-to-text software can detect some robustness issues, they are limited in effectiveness due to ignoring the characteristics of code generation systems. In this work, we propose a novel technique COCO to test the robustness of code generation systems. It exploits the usage scenario of code generation systems to make the original programming instruction more concrete by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
