Finding the Sweet Spot: Optimal Data Augmentation Ratio for Imbalanced Credit Scoring Using ADASYN
Luis H. Chia

TL;DR
This study empirically determines the optimal data augmentation ratio for imbalanced credit scoring using ADASYN, revealing that doubling the minority class yields the best predictive performance, challenging the common practice of full balancing.
Contribution
It provides the first empirical evidence of an optimal augmentation ratio in credit scoring, offering practical guidelines for selecting augmentation levels in imbalanced datasets.
Findings
ADASYN with 1x augmentation achieved the highest AUC and Gini.
Higher augmentation factors degraded model performance.
Optimal class imbalance ratio was approximately 6.6:1, not 1:1.
Abstract
Credit scoring models face a critical challenge: severe class imbalance, with default rates typically below 10%, which hampers model learning and predictive performance. While synthetic data augmentation techniques such as SMOTE and ADASYN have been proposed to address this issue, the optimal augmentation ratio remains unclear, with practitioners often defaulting to full balancing (1:1 ratio) without empirical justification. This study systematically evaluates 10 data augmentation scenarios using the Give Me Some Credit dataset (97,243 observations, 7% default rate), comparing SMOTE, BorderlineSMOTE, and ADASYN at different multiplication factors (1x, 2x, 3x). All models were trained using XGBoost and evaluated on a held-out test set of 29,173 real observations. Statistical significance was assessed using bootstrap testing with 1,000 iterations. Key findings reveal that ADASYN with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction · Imbalanced Data Classification Techniques · Credit Risk and Financial Regulations
