Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation
Luis Mayer, Christian Heumann, Matthias A{\ss}enmacher

TL;DR
This study compares the performance of five large language models in text-to-code generation tasks, finding that ChatGPT outperforms specialized models like Code Llama in solving programming problems from LeetCode.
Contribution
It provides an empirical evaluation of multiple LLMs for code generation, highlighting ChatGPT's superior performance and analyzing factors affecting output quality.
Findings
ChatGPT outperforms other models in solving LeetCode problems.
Performance decreases with longer prompts and more context.
Error analysis reveals common issues like indentation and incorrect code structure.
Abstract
In recent years, large language models (LLMs) have emerged as powerful tools with potential applications in various fields, including software engineering. Within the scope of this research, we evaluate five different state-of-the-art LLMs - Bard, BingChat, ChatGPT, Llama2, and Code Llama - concerning their capabilities for text-to-code generation. In an empirical study, we feed prompts with textual descriptions of coding problems sourced from the programming website LeetCode to the models with the task of creating solutions in Python. Subsequently, the quality of the generated outputs is assessed using the testing functionalities of LeetCode. The results indicate large differences in performance between the investigated models. ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama. To gain further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsLLaMA
