Evaluation of the Programming Skills of Large Language Models

Luc Bryan Heitz; Joun Chamas; Christopher Scherb

arXiv:2405.14388·cs.SE·May 24, 2024

Evaluation of the Programming Skills of Large Language Models

Luc Bryan Heitz, Joun Chamas, Christopher Scherb

PDF

Open Access

TL;DR

This paper evaluates the programming code quality generated by two leading large language models, ChatGPT and Gemini AI, using a real-world example and systematic dataset to assess their efficacy and reliability.

Contribution

It provides a comparative analysis of code output quality from ChatGPT and Gemini AI, highlighting their strengths and limitations in programming tasks.

Findings

01

ChatGPT and Gemini AI produce high-quality code but vary in accuracy.

02

Systematic dataset reveals differences in code correctness and reliability.

03

Study underscores the importance of evaluating LLMs for software development applications.

Abstract

The advent of Large Language Models (LLM) has revolutionized the efficiency and speed with which tasks are completed, marking a significant leap in productivity through technological innovation. As these chatbots tackle increasingly complex tasks, the challenge of assessing the quality of their outputs has become paramount. This paper critically examines the output quality of two leading LLMs, OpenAI's ChatGPT and Google's Gemini AI, by comparing the quality of programming code generated in both their free versions. Through the lens of a real-world example coupled with a systematic dataset, we investigate the code quality produced by these LLMs. Given their notable proficiency in code generation, this aspect of chatbot capability presents a particularly compelling area for analysis. Furthermore, the complexity of programming code often escalates to levels where its verification becomes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings