Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings
Petr Tsvetkov, Aleksandra Eliseeva, Danny Dig, Alexander Bezzubov,, Yaroslav Golubev, Timofey Bryksin, Yaroslav Zharov

TL;DR
This paper proposes a new evaluation approach for commit message generation systems by correlating offline similarity metrics with an online user edit-based metric, supported by a novel dataset and analysis.
Contribution
It introduces a practical online metric for CMG evaluation, a dataset of generated and edited commit messages, and reveals that edit distance correlates best with user edits, challenging previous metric assumptions.
Findings
Edit distance correlates highly with online user edits.
BLEU and METEOR show low correlation with user preferences.
User interactions with CMG differ from controlled human labeler responses.
Abstract
When a Commit Message Generation (CMG) system is integrated into the IDEs and other products at JetBrains, we perform online evaluation based on user acceptance of the generated messages. However, performing online experiments with every change to a CMG system is troublesome, as each iteration affects users and requires time to collect enough statistics. On the other hand, offline evaluation, a prevalent approach in the research literature, facilitates fast experiments but employs automatic metrics that are not guaranteed to represent the preferences of real users. In this work, we describe a novel way we employed to deal with this problem at JetBrains, by leveraging an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments. To support this new type of evaluation, we develop a novel markup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Communication Networks Research
MethodsAttention Is All You Need · Adam · Dropout · Dense Connections · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
