The impact of class imbalance in logistic regression models for low-default portfolios in credit risk
Willem D. Schutte, Charl Pretorius, Neill Smit, Leandra van der Merwe, Robert Maxwell

TL;DR
This study investigates how class imbalance impacts logistic regression performance in low-default credit portfolios, revealing that accuracy declines with imbalance while Gini remains stable under large samples.
Contribution
It provides a simulation-based analysis of class imbalance effects on logistic regression, offering practical performance guidelines for credit risk modeling.
Findings
Classification accuracy drops as event rate decreases.
Gini coefficient remains stable with large samples despite imbalance.
Optimal cut-off shifts with class imbalance levels.
Abstract
In this paper, we study how class imbalance, typical of low-default credit portfolios, affects the performance of logistic regression models. Using a simulation study with controlled data-generating mechanisms, we vary (i) the level of class imbalance and (ii) the strength of association between the predictors and the response. The results show that, for a given strength of association, achievable classification accuracy deteriorates markedly as the event rate decreases, and the optimal classification cut-off shifts with the level of imbalance. In contrast, the Gini coefficient is comparatively stable with respect to class imbalance once sample sizes are sufficiently large, even when classification accuracy is strongly affected. As a practical guideline, we summarise attainable classification performance as a function of the event rate and strength of association between the predictors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinancial Distress and Bankruptcy Prediction · Credit Risk and Financial Regulations · Imbalanced Data Classification Techniques
