Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling
Xiaofan Liua, Zuoquan Zhanga, Di Wanga

TL;DR
This paper introduces WHSBoost, an ensemble method using weighted hybrid sampling techniques to improve classification of imbalanced credit scoring datasets, demonstrating robustness across various data types and classifiers.
Contribution
The paper presents a novel ensemble algorithm, WHSBoost, combining Weighted-SMOTE and Weighted-Under-Sampling for better handling of imbalanced credit scoring data.
Findings
WHSBoost outperforms traditional sampling methods in classification accuracy.
The method is effective across multiple classifiers and datasets.
WHSBoost improves minority class detection in credit scoring applications.
Abstract
In the era of big data, the utilization of credit-scoring models to determine the credit risk of applicants accurately becomes a trend in the future. The conventional machine learning on credit scoring data sets tends to have poor classification for the minority class, which may bring huge commercial harm to banks. In order to classify imbalanced data sets, we propose a new ensemble algorithm, namely, Weighted-Hybrid-Sampling-Boost (WHSBoost). In data sampling, we process the imbalanced data sets with weights by the Weighted-SMOTE method and the Weighted-Under-Sampling method, and thus obtain a balanced training sample data set with equal weight. In ensemble algorithm, each time we train the base classifier, the balanced data set is given by the method above. In order to verify the applicability and robustness of the WHSBoost algorithm, we performed experiments on the simulation data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction
MethodsSupport Vector Machine · Synthetic Minority Over-sampling Technique.
