From Commit Message Generation to History-Aware Commit Message Completion
Aleksandra Eliseeva, Yaroslav Sokolov, Egor Bogomolov, Yaroslav, Golubev, Danny Dig, Timofey Bryksin

TL;DR
This paper introduces a history-aware approach to commit message completion, leveraging previous commit data and a new large dataset to improve message quality and relevance, with promising results for certain contexts.
Contribution
It proposes a novel commit message completion paradigm using historical context and introduces the CommitChronicle dataset for evaluation.
Findings
Commit message completion can outperform generation in some contexts.
Historical data improves commit message prediction accuracy.
GPT-3.5-turbo shows potential for detailed commit messages.
Abstract
Commit messages are crucial to software development, allowing developers to track changes and collaborate effectively. Despite their utility, most commit messages lack important information since writing high-quality commit messages is tedious and time-consuming. The active research on commit message generation (CMG) has not yet led to wide adoption in practice. We argue that if we could shift the focus from commit message generation to commit message completion and use previous commit history as additional context, we could significantly improve the quality and the personal nature of the resulting commit messages. In this paper, we propose and evaluate both of these novel ideas. Since the existing datasets lack historical data, we collect and share a novel dataset called CommitChronicle, containing 10.7M commits across 20 programming languages. We use this dataset to evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗JetBrains-Research/cmg-codet5-without-historymodel· 10 dl10 dl
- 🤗JetBrains-Research/cmg-codet5-with-historymodel· 2 dl· ♡ 22 dl♡ 2
- 🤗JetBrains-Research/cmg-codereviewer-without-historymodel· 9 dl· ♡ 39 dl♡ 3
- 🤗JetBrains-Research/cmg-codereviewer-with-historymodel· 3 dl· ♡ 33 dl♡ 3
- 🤗JetBrains-Research/cmg-race-without-historymodel· 3 dl· ♡ 13 dl♡ 1
- 🤗JetBrains-Research/cmg-race-with-historymodel· 4 dl4 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Byte Pair Encoding · Weight Decay · {Dispute@FaQ-s}How to file a dispute with Expedia? · Adam · Softmax · Cosine Annealing · Attention Dropout
