Towards Automatic Generation of Short Summaries of Commits
Siyuan Jiang, Collin McMillan

TL;DR
This paper explores automatic generation of short commit summaries by analyzing human-written messages and proposing a 'verb+object' format, with initial focus on classifying diffs to verbs to improve summary quality.
Contribution
The paper introduces a novel approach using a 'verb+object' format for commit summaries and presents an initial classifier to assign verbs based on diffs, inspired by linguistic patterns in human messages.
Findings
82% of human commit messages are single sentence
Automated messages often have multiple lines
Initial classifier can assign verbs to diffs
Abstract
Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studying the differences between auto-generated and human-written messages, we found that 82% of the human-written messages have only one sentence, while the automatically generated messages often have multiple lines. Furthermore, we found that the commit messages often begin with a verb followed by an direct object. This finding inspired us to use a "verb+object" format in this paper to generate short commit summaries. We split the approach into two parts: verb generation and object generation. As our first try, we trained a classifier to classify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Web Application Security Vulnerabilities
