Evaluating Commit Message Generation: To BLEU Or Not To BLEU?

Samanta Dey (1); Venkatesh Vinayakarao (1); Monika Gupta (2); Sampath; Dechu (2) ((1) Chennai Mathematical Institute; (2) IBM Research)

arXiv:2204.09533·cs.SE·April 21, 2022

Evaluating Commit Message Generation: To BLEU Or Not To BLEU?

Samanta Dey (1), Venkatesh Vinayakarao (1), Monika Gupta (2), Sampath, Dechu (2) ((1) Chennai Mathematical Institute, (2) IBM Research)

PDF

1 Repo

TL;DR

This paper critically examines the effectiveness of BLEU4 and similar metrics in evaluating commit message generation tools, proposing a new metric tailored for this task to improve assessment accuracy.

Contribution

The paper identifies limitations of BLEU4 for commit message evaluation and introduces a new, more suitable metric, re-evaluating existing tools with it.

Findings

01

BLEU4 and variants have weaknesses in CMG evaluation

02

A new metric better captures quality of commit messages

03

Re-evaluation shows different performance rankings

Abstract

Commit messages play an important role in several software engineering tasks such as program comprehension and understanding program evolution. However, programmers neglect to write good commit messages. Hence, several Commit Message Generation (CMG) tools have been proposed. We observe that the recent state of the art CMG tools use simple and easy to compute automated evaluation metrics such as BLEU4 or its variants. The advances in the field of Machine Translation (MT) indicate several weaknesses of BLEU4 and its variants. They also propose several other metrics for evaluating Natural Language Generation (NLG) tools. In this work, we discuss the suitability of various MT metrics for the CMG task. Based on the insights from our experiments, we propose a new variant specifically for evaluating the CMG task. We re-evaluate the state of the art CMG tools on our new metric. We believe that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cmgeval/evaluating-cmg
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.