Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language   Models for Text-to-Code Generation

Luis Mayer; Christian Heumann; Matthias A{\ss}enmacher

arXiv:2409.04164·cs.CL·September 9, 2024

Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation

Luis Mayer, Christian Heumann, Matthias A{\ss}enmacher

PDF

Open Access

TL;DR

This study compares the performance of five large language models in text-to-code generation tasks, finding that ChatGPT outperforms specialized models like Code Llama in solving programming problems from LeetCode.

Contribution

It provides an empirical evaluation of multiple LLMs for code generation, highlighting ChatGPT's superior performance and analyzing factors affecting output quality.

Findings

01

ChatGPT outperforms other models in solving LeetCode problems.

02

Performance decreases with longer prompts and more context.

03

Error analysis reveals common issues like indentation and incorrect code structure.

Abstract

In recent years, large language models (LLMs) have emerged as powerful tools with potential applications in various fields, including software engineering. Within the scope of this research, we evaluate five different state-of-the-art LLMs - Bard, BingChat, ChatGPT, Llama2, and Code Llama - concerning their capabilities for text-to-code generation. In an empirical study, we feed prompts with textual descriptions of coding problems sourced from the programming website LeetCode to the models with the task of creating solutions in Python. Subsequently, the quality of the generated outputs is assessed using the testing functionalities of LeetCode. The results indicate large differences in performance between the investigated models. ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama. To gain further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education

MethodsLLaMA