NLPerturbator: Studying the Robustness of Code LLMs to Natural Language Variations
Junkai Chen, Zhenhao Li, Xing Hu, Xin Xia

TL;DR
This paper investigates how natural language variations affect the robustness of code-generating large language models, revealing significant performance drops and emphasizing the need for more resilient prompt design.
Contribution
It introduces NLPerturbator, an automated framework for simulating real-world natural language perturbations and provides a comprehensive analysis of their impact on code LLMs.
Findings
Perturbed prompts can reduce code generation performance by up to 21.2%.
Natural language variations cause average performance drops of 4.8% to 6.1%.
Highlighting the importance of robust prompt construction for code LLMs.
Abstract
Large language models (LLMs) achieve promising results in code generation based on a given natural language description. They have been integrated into open-source projects and commercial products to facilitate daily coding activities. The natural language description in the prompt is crucial for LLMs to comprehend users' requirements. Prior studies uncover that LLMs are sensitive to the changes in the prompts, including slight changes that look inconspicuous. However, the natural language descriptions often vary in real-world scenarios (e.g., different formats, grammar, and wording). Prior studies on the robustness of LLMs are often based on random perturbations and such perturbations may not actually happen. In this paper, we conduct a comprehensive study to investigate how are code LLMs robust to variations of natural language description in real-world scenarios. We summarize 18…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
