A Full-fledged Commit Message Quality Checker Based on Machine Learning
David Farag\'o, Michael F\"arber, Christian Petrov

TL;DR
This paper presents an open-source machine learning framework that automatically assesses commit message quality based on established guidelines, improving software maintenance and developer practices.
Contribution
It introduces a comprehensive, practical tool that evaluates commit messages against quality rules using state-of-the-art machine learning models, filling a gap in existing tools.
Findings
Achieved a lowest F1 score of 82.9% on challenging tasks
Developed a full framework for commit message quality checking
Supports research and practical software development improvements
Abstract
Commit messages (CMs) are an essential part of version control. By providing important context in regard to what has changed and why, they strongly support software maintenance and evolution. But writing good CMs is difficult and often neglected by developers. So far, there is no tool suitable for practice that automatically assesses how well a CM is written, including its meaning and context. Since this task is challenging, we ask the research question: how well can the CM quality, including semantics and context, be measured with machine learning methods? By considering all rules from the most popular CM quality guideline, creating datasets for those rules, and training and evaluating state-of-the-art machine learning models to check those rules, we can answer the research question with: sufficiently well for practice, with the lowest F score of 82.9\%, for the most challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Data Quality and Management · Software System Performance and Reliability
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
