TL;DR
This study evaluates ChatGPT 4's ability to generate code across 19 languages, analyzing success rates, error types, and efficiency, revealing strengths in popular, low-abstraction, statically typed languages and identifying areas for improvement.
Contribution
It provides a comprehensive, quantitative comparison of ChatGPT 4's code generation performance across multiple programming languages and difficulty levels, highlighting language-specific strengths and weaknesses.
Findings
ChatGPT 4 solved 39.67% of tasks, with success decreasing on harder problems.
Higher competence in widely used, low-abstraction, statically typed languages.
Runtime efficiency was above average across all languages, with variable memory performance.
Abstract
This bachelor's thesis examines the capabilities of ChatGPT 4 in code generation across 19 programming languages. The study analyzed solution rates across three difficulty levels, types of errors encountered, and code quality in terms of runtime and memory efficiency through a quantitative experiment. A total of 188 programming problems were selected from the LeetCode platform, and ChatGPT 4 was given three attempts to produce a correct solution with feedback. ChatGPT 4 successfully solved 39.67% of all tasks, with success rates decreasing significantly as problem complexity increased. Notably, the model faced considerable challenges with hard problems across all languages. ChatGPT 4 demonstrated higher competence in widely used languages, likely due to a larger volume and higher quality of training data. The solution rates also revealed a preference for languages with low abstraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
