Deep Assessment of Code Review Generation Approaches: Beyond Lexical   Similarity

Yanjie Jiang; Hui Liu; Tianyi Chen; Fu Fan; Chunhao Dong; Kui Liu; Lu; Zhang

arXiv:2501.05176·cs.SE·January 10, 2025

Deep Assessment of Code Review Generation Approaches: Beyond Lexical Similarity

Yanjie Jiang, Hui Liu, Tianyi Chen, Fu Fan, Chunhao Dong, Kui Liu, Lu, Zhang

PDF

Open Access

TL;DR

This paper introduces a semantic similarity-based evaluation framework for automated code review assessment, surpassing traditional lexical metrics and demonstrating improved correlation with human judgment.

Contribution

It presents a new benchmark, extit{GradedReviews}, and two innovative semantic-based methods, including deep learning vector comparison and ChatGPT-based scoring, for more accurate review evaluation.

Findings

01

Semantic approaches outperform lexical metrics in correlation with human scores.

02

The proposed methods improve correlation coefficient from 0.22 to 0.47.

03

Benchmark extit{GradedReviews} enables better assessment of code review quality.

Abstract

Code review is a standard practice for ensuring the quality of software projects, and recent research has focused extensively on automated code review. While significant advancements have been made in generating code reviews, the automated assessment of these reviews remains less explored, with existing approaches and metrics often proving inaccurate. Current metrics, such as BLEU, primarily rely on lexical similarity between generated and reference reviews. However, such metrics tend to underestimate reviews that articulate the expected issues in ways different from the references. In this paper, we explore how semantic similarity between generated and reference reviews can enhance the automated assessment of code reviews. We first present a benchmark called \textit{GradedReviews}, which is constructed by collecting real-world code reviews from open-source projects, generating reviews…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Model-Driven Software Engineering Techniques · Software Reliability and Analysis Research