Automated Classification of Source Code Changes Based on Metrics Clustering in the Software Development Process
Evgenii Kniazev

TL;DR
This paper introduces an automated clustering-based method for classifying source code changes during development, reducing review time by automating change distribution and involving expert mapping for classification.
Contribution
The paper proposes a novel automated approach using k-means clustering on code metrics to classify code changes, with validation on multiple software systems.
Findings
Achieved classification purity of approximately 0.75
Demonstrated effective clustering on open-source projects
Reduced manual effort in code change review
Abstract
This paper presents an automated method for classifying source code changes during the software development process based on clustering of change metrics. The method consists of two steps: clustering of metric vectors computed for each code change, followed by expert mapping of the resulting clusters to predefined change classes. The distribution of changes into clusters is performed automatically, while the mapping of clusters to classes is carried out by an expert. Automation of the distribution step substantially reduces the time required for code change review. The k-means algorithm with a cosine similarity measure between metric vectors is used for clustering. Eleven source code metrics are employed, covering lines of code, cyclomatic complexity, file counts, interface changes, and structural changes. The method was validated on five software systems, including two open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Open Source Software Innovations
