Distilling Desired Comments for Enhanced Code Review with Large Language   Models

Yongda Yu; Lei Zhang; Guoping Rong; Haifeng Shen; Jiahao Zhang,; Haoxiang Yan; Guohao Shi; Dong Shao; Ruiqi Pan; Yuan Li; Qiushi Wang; Zhao; Tian

arXiv:2412.20340·cs.SE·January 7, 2025

Distilling Desired Comments for Enhanced Code Review with Large Language Models

Yongda Yu, Lei Zhang, Guoping Rong, Haifeng Shen, Jiahao Zhang,, Haoxiang Yan, Guohao Shi, Dong Shao, Ruiqi Pan, Yuan Li, Qiushi Wang, Zhao, Tian

PDF

Open Access

TL;DR

This paper introduces Desiview, a dataset distillation method that automatically extracts desired review comments from code review data, significantly improving LLMs' ability to generate accurate and relevant code review comments.

Contribution

The paper proposes Desiview, a novel automatic dataset distillation approach for enhancing LLMs in code review tasks, and demonstrates its effectiveness with state-of-the-art performance.

Findings

01

Desiview achieves over 88% precision and 86% accuracy in identifying desired review comments.

02

Fine-tuning LLaMA models with the distilled dataset improves their code review comment generation.

03

Enhanced models outperform base LLMs in accuracy and relevance of review comments.

Abstract

There has been a growing interest in using Large Language Models (LLMs) for code review thanks to their proven proficiency in code comprehension. The primary objective of most review scenarios is to generate desired review comments (DRCs) that explicitly identify issues to trigger code fixes. However, existing LLM-based solutions are not so effective in generating DRCs for various reasons such as hallucination. To enhance their code review ability, they need to be fine-tuned with a customized dataset that is ideally full of DRCs. Nevertheless, such a dataset is not yet available, while manual annotation of DRCs is too laborious to be practical. In this paper, we propose a dataset distillation method, Desiview, which can automatically construct a distilled dataset by identifying DRCs from a code review dataset. Experiments on the CodeReviewer dataset comprising more than 150K review…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques

MethodsBalanced Selection · LLaMA