Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond
Pengyu Xue, Linhao Wu, Zhongxing Yu, Zhi Jin, Zhen Yang, Xinyi Li,, Zhenyu Yang, Yue Tan

TL;DR
This study systematically evaluates large language models for automated commit message generation, demonstrating their superiority over existing methods and proposing an efficient retrieval-based framework to enhance performance and efficiency.
Contribution
It is the first comprehensive empirical analysis of LLMs in commit message generation and introduces ERICommiter, a retrieval-based framework that improves accuracy and efficiency.
Findings
LLMs outperform state-of-the-art CMG approaches.
GPT-3.5 achieves the best overall performance.
ERICommiter significantly reduces retrieval time with minimal performance loss.
Abstract
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs, which facilitate collaboration among developers and play a critical role in Open-Source Software (OSS). Very recently, Large Language Models (LLMs) have demonstrated extensive applicability in diverse code-related task. But few studies systematically explored their effectiveness using LLMs. This paper conducts the first comprehensive experiment to investigate how far we have been in applying LLM to generate high-quality commit messages. Motivated by a pilot analysis, we first clean the most widely-used CMG dataset following practitioners' criteria. Afterward, we re-evaluate diverse state-of-the-art CMG approaches and make comparisons with LLMs, demonstrating the superior performance of LLMs against state-of-the-art CMG approaches. Then, we further propose four manual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Semantic Web and Ontologies
