Automatically Generating Commit Messages from Diffs using Neural Machine Translation
Siyuan Jiang, Ameer Armaly, Collin McMillan

TL;DR
This paper presents a neural machine translation approach to automatically generate concise, high-level commit messages from code diffs, trained on top GitHub projects, with a quality filter to ensure message usefulness.
Contribution
It adapts NMT for translating code diffs into commit messages and introduces a quality filter to improve message relevance and reliability.
Findings
Generated messages are often very high or very low quality.
A quality-assurance filter effectively detects unreliable messages.
The approach leverages data from top GitHub projects for training.
Abstract
Commit messages are a valuable resource in comprehension of software evolution, since they provide a record of changes such as feature additions and bug repairs. Unfortunately, programmers often neglect to write good commit messages. Different techniques have been proposed to help programmers by automatically writing these messages. These techniques are effective at describing what changed, but are often verbose and lack context for understanding the rationale behind a change. In contrast, humans write messages that are short and summarize the high level rationale. In this paper, we adapt Neural Machine Translation (NMT) to automatically "translate" diffs into commit messages. We trained an NMT algorithm using a corpus of diffs and human-written commit messages from the top 1k Github projects. We designed a filter to help ensure that we only trained the algorithm on higher-quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Software Engineering Methodologies · Software Reliability and Analysis Research
