Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
Li Zhong, Zilong Wang

TL;DR
This paper investigates the robustness and reliability of large language models in generating code, revealing significant API misuse issues that could impact real-world software development, especially for novice users.
Contribution
It introduces RobustAPI, a new dataset of real-world coding questions, and analyzes API misuse patterns in LLM-generated code, highlighting critical reliability concerns.
Findings
62% of GPT-4 generated code contains API misuses
Existing benchmarks do not reflect real-world coding challenges
API misuses can cause resource leaks and crashes
Abstract
Recently, the large language models (LLMs) have shown extraordinary ability in understanding natural language and generating programming code. It has been a common practice of software engineers to consult LLMs when encountering coding questions. Although efforts have been made to avoid syntax errors and align the code with the intended semantics, the reliability and robustness of the code generationfrom LLMs have not yet been thoroughly studied. The executable code is not equivalent to the reliable and robust code, especially in the context of real-world software development. The misuse of APIs in the generated code could lead to severe problem, such as resource leaks, program crashes. To make things worse, the users of LLM code generation services are actually the developers that are most vulnerable to these code that seems right -- They are always novice developers that are not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Artificial Intelligence in Healthcare and Education
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections
