Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability   of Large Language Model Code Generation

Li Zhong; Zilong Wang

arXiv:2308.10335·cs.CL·January 30, 2024·6 cites

Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation

Li Zhong, Zilong Wang

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper investigates the robustness and reliability of large language models in generating code, revealing significant API misuse issues that could impact real-world software development, especially for novice users.

Contribution

It introduces RobustAPI, a new dataset of real-world coding questions, and analyzes API misuse patterns in LLM-generated code, highlighting critical reliability concerns.

Findings

01

62% of GPT-4 generated code contains API misuses

02

Existing benchmarks do not reflect real-world coding challenges

03

API misuses can cause resource leaks and crashes

Abstract

Recently, the large language models (LLMs) have shown extraordinary ability in understanding natural language and generating programming code. It has been a common practice of software engineers to consult LLMs when encountering coding questions. Although efforts have been made to avoid syntax errors and align the code with the intended semantics, the reliability and robustness of the code generationfrom LLMs have not yet been thoroughly studied. The executable code is not equivalent to the reliable and robust code, especially in the context of real-world software development. The misuse of APIs in the generated code could lead to severe problem, such as resource leaks, program crashes. To make things worse, the users of LLM code generation services are actually the developers that are most vulnerable to these code that seems right -- They are always novice developers that are not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

floridsleeves/robustapi
noneOfficial

Datasets

LilyZZZ/RobustAPI
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Artificial Intelligence in Healthcare and Education

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections