Improving Requirements Classification with SMOTE-Tomek Preprocessing
Barak Or

TL;DR
This paper demonstrates that applying SMOTE-Tomek preprocessing with stratified cross-validation significantly improves requirements classification accuracy, especially for minority classes, using machine learning models like logistic regression.
Contribution
It introduces a novel combination of SMOTE-Tomek preprocessing with stratified cross-validation for requirements classification, enhancing minority class representation and model performance.
Findings
Logistic regression accuracy improved to 76.16%.
SMOTE-Tomek preprocessing outperforms baseline methods.
Effective handling of class imbalance in requirements data.
Abstract
This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
