A case study on the transformative potential of AI in software engineering on LeetCode and ChatGPT
Manuel Merkel, Jens D\"orpinghaus

TL;DR
This study compares GPT-4o generated code with human-written code on LeetCode, finding GPT-4o produces comparable or lower quality code and faces challenges generalising to unseen problems, providing valuable insights into AI's role in software engineering.
Contribution
First large-scale comparison of AI-generated and human-written code on LeetCode across multiple quality and performance metrics.
Findings
GPT-4o code has similar or lower quality than human code
Generated code shows lower understandability and runtime performance
GPT-4o struggles with problems outside its training data
Abstract
The recent surge in the field of generative artificial intelligence (GenAI) has the potential to bring about transformative changes across a range of sectors, including software engineering and education. As GenAI tools, such as OpenAI's ChatGPT, are increasingly utilised in software engineering, it becomes imperative to understand the impact of these technologies on the software product. This study employs a methodological approach, comprising web scraping and data mining from LeetCode, with the objective of comparing the software quality of Python programs produced by LeetCode users with that generated by GPT-4o. In order to gain insight into these matters, this study addresses the question whether GPT-4o produces software of superior quality to that produced by humans. The findings indicate that GPT-4o does not present a considerable impediment to code quality, understandability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTechnology and Data Analysis · Big Data Technologies and Applications
