Evaluating Source Code Quality with Large Language Models: a comparative   study

Igor Regis da Silva Sim\~oes; Elaine Venson

arXiv:2408.07082·cs.SE·October 7, 2024

Evaluating Source Code Quality with Large Language Models: a comparative study

Igor Regis da Silva Sim\~oes, Elaine Venson

PDF

Open Access

TL;DR

This study explores the potential of large language models to evaluate source code quality, comparing their assessments with traditional static analysis tools like SonarQube across open source Java projects.

Contribution

It provides an empirical comparison of GPT 3.5 Turbo and GPT 4o in assessing code quality, highlighting their capabilities and limitations.

Findings

01

GPT 3.5 Turbo correlates with SonarQube metrics

02

GPT 4o diverges from traditional assessments

03

LLMs show potential but have limitations in code quality evaluation

Abstract

Code quality is an attribute composed of various metrics, such as complexity, readability, testability, interoperability, reusability, and the use of good or bad practices, among others. Static code analysis tools aim to measure a set of attributes to assess code quality. However, some quality attributes can only be measured by humans in code review activities, readability being an example. Given their natural language text processing capability, we hypothesize that a Large Language Model (LLM) could evaluate the quality of code, including attributes currently not automatable. This paper aims to describe and analyze the results obtained using LLMs as a static analysis tool, evaluating the overall quality of code. We compared the LLM with the results obtained with the SonarQube software and its Maintainability metric for two Open Source Software (OSS) Java projects, one with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis