Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation
Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor, Hamid Karimi

TL;DR
This study thoroughly evaluates ChatGPT's ability to generate code, comparing it to human programmers across various metrics, revealing strengths in efficiency and data tasks but limitations in visual challenges.
Contribution
It introduces a novel dataset and evaluation methodology for assessing ChatGPT's code generation, providing insights into its capabilities and limitations with quantitative and qualitative analysis.
Findings
ChatGPT achieves 93.1% accuracy in data analysis tasks
It excels in concise, efficient, modular code with good error handling
Machine learning models can distinguish ChatGPT code from human code with 88% accuracy
Abstract
This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis. Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples. A meticulous manual assessment methodology prioritized evaluating correctness, comprehensibility, and security using 14 established code quality metrics. The key findings reveal ChatGPT's strengths in crafting concise, efficient code with advanced constructs, showcasing strengths in data analysis tasks (93.1% accuracy) but limitations in visual-graphical challenges. Comparative analysis with human code highlights ChatGPT's inclination towards modular design and superior error handling. Additionally, machine learning models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Software Engineering Research
