Automated Classification of Source Code Changes Based on Metrics Clustering in the Software Development Process

Evgenii Kniazev

arXiv:2602.14591·cs.SE·February 17, 2026

Automated Classification of Source Code Changes Based on Metrics Clustering in the Software Development Process

Evgenii Kniazev

PDF

Open Access

TL;DR

This paper introduces an automated clustering-based method for classifying source code changes during development, reducing review time by automating change distribution and involving expert mapping for classification.

Contribution

The paper proposes a novel automated approach using k-means clustering on code metrics to classify code changes, with validation on multiple software systems.

Findings

01

Achieved classification purity of approximately 0.75

02

Demonstrated effective clustering on open-source projects

03

Reduced manual effort in code change review

Abstract

This paper presents an automated method for classifying source code changes during the software development process based on clustering of change metrics. The method consists of two steps: clustering of metric vectors computed for each code change, followed by expert mapping of the resulting clusters to predefined change classes. The distribution of changes into clusters is performed automatically, while the mapping of clusters to classes is carried out by an expert. Automation of the distribution step substantially reduces the time required for code change review. The k-means algorithm with a cosine similarity measure between metric vectors is used for clustering. Eleven source code metrics are employed, covering lines of code, cyclomatic complexity, file counts, interface changes, and structural changes. The method was validated on five software systems, including two open-source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Open Source Software Innovations