Evaluating ChatGPT-3.5 Efficiency in Solving Coding Problems of   Different Complexity Levels: An Empirical Analysis

Minda Li; Bhaskar Krishnamachari

arXiv:2411.07529·cs.SE·November 13, 2024

Evaluating ChatGPT-3.5 Efficiency in Solving Coding Problems of Different Complexity Levels: An Empirical Analysis

Minda Li, Bhaskar Krishnamachari

PDF

Open Access 1 Repo

TL;DR

This study empirically evaluates ChatGPT-3.5's ability to solve coding problems of varying difficulty levels on LeetCode, demonstrating that prompt engineering and language choice significantly impact its performance.

Contribution

It provides a systematic analysis of ChatGPT-3.5's problem-solving capabilities across difficulty levels, highlighting the effects of prompt engineering and language preferences.

Findings

01

ChatGPT solves 92% of easy, 79% of medium, and 51% of hard problems.

02

Prompt engineering improves performance, especially on easier problems.

03

ChatGPT performs best in Python, Java, and C++, with limited success in less common languages.

Abstract

ChatGPT and other large language models (LLMs) promise to revolutionize software development by automatically generating code from program specifications. We assess the performance of ChatGPT's GPT-3.5-turbo model on LeetCode, a popular platform with algorithmic coding challenges for technical interview practice, across three difficulty levels: easy, medium, and hard. We test three main hypotheses. First, ChatGPT solves fewer problems as difficulty rises (Hypothesis 1). Second, prompt engineering improves ChatGPT's performance, with greater gains on easier problems and diminishing returns on harder ones (Hypothesis 2). Third, ChatGPT performs better in popular languages like Python, Java, and C++ than in less common ones like Elixir, Erlang, and Racket (Hypothesis 3). To investigate these hypotheses, we conduct automated experiments using Python scripts to generate prompts that instruct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anrgusc/coding_gpt_testing
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Label Smoothing · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Attention Dropout