Toward the Automatic Classification of Self-Affirmed Refactoring
Eman Abdullah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni

TL;DR
This paper presents an automated approach to classify self-affirmed refactoring commits in software development, improving accuracy over previous manual methods and uncovering additional refactoring patterns.
Contribution
It introduces a two-step machine learning model combining N-Gram TF-IDF and classifiers to automatically categorize refactoring commits based on quality improvement categories.
Findings
Model achieves up to 90% F-measure in classification accuracy.
Outperforms pattern-based and random classifiers.
Discovers 40 additional relevant SAR patterns.
Abstract
The concept of Self-Affirmed Refactoring (SAR) was introduced to explore how developers document their refactoring activities in commit messages, i.e., developers' explicit documentation of refactoring operations intentionally introduced during a code change. In our previous study, we have manually identified refactoring patterns and defined three main common quality improvement categories, including internal quality attributes, external quality attributes, and code smells, by only considering refactoring-related commits. However, this approach heavily depends on the manual inspection of commit messages. In this paper, we propose a two-step approach to first identify whether a commit describes developer-related refactoring events, then to classify it according to the refactoring common quality improvement categories. Specifically, we combine the N-Gram TF-IDF feature selection with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFeature Selection
