Exploring the Robustness of Large Language Models for Solving Programming Problems
Atsushi Shirafuji, Yutaka Watanobe, Takumi Ito, Makoto Morishita, Yuki, Nakamura, Yusuke Oda, Jun Suzuki

TL;DR
This paper investigates how robust large language models are in solving programming problems, revealing that while models like Codex are sensitive to superficial changes, newer models like ChatGPT are more resilient and effective.
Contribution
The study provides a comparative analysis of the robustness of various LLMs in code generation, highlighting the improved resilience of state-of-the-art models against prompt modifications.
Findings
Codex and CodeGen are sensitive to superficial description changes.
Randomizing variable names significantly reduces Codex's success rate.
ChatGPT and InstructGPT demonstrate higher robustness and better problem-solving capabilities.
Abstract
Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However, the extent to which LLMs understand problem descriptions and generate programs accordingly or just retrieve source code from the most relevant problem in training data based on superficial cues has not been discovered yet. To explore this research question, we conduct experiments to understand the robustness of several popular LLMs, CodeGen and GPT-3.5 series models, capable of tackling code generation tasks in introductory programming problems. Our experimental results show that CodeGen and Codex are sensitive to the superficial modifications of problem descriptions and significantly impact code generation performance. Furthermore, we observe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Machine Learning and Data Classification
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Byte Pair Encoding · Residual Connection · Weight Decay · Softmax
