Human-Like Code Quality Evaluation through LLM-based Recursive Semantic   Comprehension

Fangzhou Xu; Sai Zhang; Zhenchang Xing; Xiaowang Zhang; Yahong Han,; Zhiyong Feng

arXiv:2412.00314·cs.SE·December 3, 2024

Human-Like Code Quality Evaluation through LLM-based Recursive Semantic Comprehension

Fangzhou Xu, Sai Zhang, Zhenchang Xing, Xiaowang Zhang, Yahong Han,, Zhiyong Feng

PDF

Open Access

TL;DR

This paper introduces HuCoSC, a recursive LLM-based method for evaluating code quality by understanding code semantics more accurately, outperforming existing methods in aligning with human judgment and execution results.

Contribution

The paper presents a novel recursive semantic comprehension approach using LLMs and a semantic dependency storage to improve code quality evaluation accuracy.

Findings

01

HuCoSC outperforms state-of-the-art methods in correlation with human judgment.

02

HuCoSC achieves higher correlation with code execution results.

03

The recursive semantic approach enhances LLM understanding of code semantics.

Abstract

Code quality evaluation involves scoring generated code quality based on a reference code for a specific problem statement. Currently, there are two main forms of evaluating code quality: match-based evaluation and execution-based evaluation. The former requires the collection of a large number of test cases, making a huge cost. The latter relies on superficial code matching as an evaluation metric, which fails to accurately capture code semantics. Moreover, extensive research has demonstrated that match-based evaluations do not truly reflect code quality. With the development of large language models (LLMs) in recent years, studies have proven the feasibility of using LLMs as evaluators for generative tasks. However, due to issues like hallucinations and uncertainty in LLMs, their correlation with human judgment remains at a lower level, making the direct use of LLMs for code quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques