Improving Requirements Classification with SMOTE-Tomek Preprocessing

Barak Or

arXiv:2501.06491·cs.SE·December 30, 2025

Improving Requirements Classification with SMOTE-Tomek Preprocessing

Barak Or

PDF

TL;DR

This paper demonstrates that applying SMOTE-Tomek preprocessing with stratified cross-validation significantly improves requirements classification accuracy, especially for minority classes, using machine learning models like logistic regression.

Contribution

It introduces a novel combination of SMOTE-Tomek preprocessing with stratified cross-validation for requirements classification, enhancing minority class representation and model performance.

Findings

01

Logistic regression accuracy improved to 76.16%.

02

SMOTE-Tomek preprocessing outperforms baseline methods.

03

Effective handling of class imbalance in requirements data.

Abstract

This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLogistic Regression