Evaluating Generated Commit Messages with Large Language Models

Qunhong Zeng; Yuxia Zhang; Zexiong Ma; Bo Jiang; Ningyuan Sun; Klaas-Jan Stol; Xingyu Mou; Hui Liu

arXiv:2507.10906·cs.SE·July 16, 2025

Evaluating Generated Commit Messages with Large Language Models

Qunhong Zeng, Yuxia Zhang, Zexiong Ma, Bo Jiang, Ningyuan Sun, Klaas-Jan Stol, Xingyu Mou, Hui Liu

PDF

Open Access

TL;DR

This paper explores using Large Language Models as automated evaluators for generated commit messages, showing they outperform traditional metrics and approach human-level assessment in quality evaluation.

Contribution

It introduces a novel LLM-based evaluation method for commit messages, demonstrating its effectiveness and robustness compared to traditional automatic metrics.

Findings

01

LLMs with Chain-of-Thought reasoning excel in commit message evaluation

02

The proposed LLM evaluator outperforms traditional metrics in accuracy

03

The method offers a scalable alternative to human evaluation

Abstract

Commit messages are essential in software development as they serve to document and explain code changes. Yet, their quality often falls short in practice, with studies showing significant proportions of empty or inadequate messages. While automated commit message generation has advanced significantly, particularly with Large Language Models (LLMs), the evaluation of generated messages remains challenging. Traditional reference-based automatic metrics like BLEU, ROUGE-L, and METEOR have notable limitations in assessing commit message quality, as they assume a one-to-one mapping between code changes and commit messages, leading researchers to rely on resource-intensive human evaluation. This study investigates the potential of LLMs as automated evaluators for commit message quality. Through systematic experimentation with various prompt strategies and state-of-the-art LLMs, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling