Conventional Commit Classification using Large Language Models and Prompt Engineering
H. M. Sazzad Quadir, Sakib Al Hasan, Md. Nurul Ahad Tawhid

TL;DR
This paper explores using large language models with prompt engineering as a training-free method for classifying conventional commits, demonstrating that few-shot prompting with larger models yields high accuracy.
Contribution
It introduces a training-free approach leveraging LLMs and prompt strategies for commit classification, avoiding the need for labeled datasets and fine-tuning.
Findings
Few-shot prompting outperforms other strategies in accuracy.
Larger models like DeepSeek-R1-32B perform best.
Chain-of-thought prompting does not improve classification results.
Abstract
Conventional commits provide a structured format for writing commit messages, which improves readability, software maintenance, and enables automation tools such as changelog generators and semantic versioning systems. Existing approaches to conventional commit classification typically rely on ML/DL models trained on large labeled datasets. In this paper, we investigated a training-free alternative by leveraging large language models (LLMs) through prompt engineering. Rather than building a task-specific classifier, we evaluate three prompting strategies, such as zero-shot, few-shot, and chain-of-thought, across three open-source LLMs of varying scale: Mistral-7B-Instruct, LLaMA-3-8B, and DeepSeek-R1-32B. Classification is performed directly on code diffs extracted from a balanced dataset of 3,200 commits mined from the InfluxDB repository, without any model fine-tuning. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
