CommitSuite: A Comprehensive Benchmark for Commit Classification and Message Generation
Zirui Wan, Zhaonan Wu, Xinyi Hou, Yanjie Zhao, Pengcheng Xia, Haoyu Wang

TL;DR
CommitSuite introduces a large-scale, CCS-compliant commit benchmark with semantic annotations and a novel reference-free evaluation framework, advancing research in commit classification and message generation.
Contribution
It provides the first extensive benchmark dataset with semantic annotations and a new evaluation method for commit message generation and classification.
Findings
LLMs support commit message generation effectively.
Evaluation framework achieves 0.849 Cohen's Kappa agreement.
CommitSuite enables reproducible research in commit understanding.
Abstract
High-quality commit messages are critical for maintaining software projects, yet ensuring their consistency and informativeness remains a practical challenge. While the Conventional Commits Specification (CCS) provides a structured format for commit messages, research on CCS-based commit classification and commit message generation (CMG) is limited by the absence of large-scale benchmarks, semantic annotations, and reliable evaluation methods. In this paper, we introduce CommitSuite, a benchmark comprising 63,533 CCS-compliant commits from 243 open-source repositories across seven programming languages. Each commit is labeled with its CCS type and enriched with AST-level code changes, along with LLM-assisted semantic annotations that capture the "what" and "why" behind the change. To evaluate CMG systems, we propose a reference-free framework based on five binary metrics: rationality,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
