Credit risk prediction in an imbalanced social lending environment
Anahita Namvar, Mohammad Siami, Fethi Rabhi, Mohsen Naderpour

TL;DR
This paper compares different classifiers and resampling techniques for credit risk prediction in imbalanced social lending data, proposing a novel evaluation method that emphasizes minority class accuracy.
Contribution
It introduces an empirical comparison of classifier-resampling combinations using G-mean, highlighting the effectiveness of random forest with under-sampling for social lending risk assessment.
Findings
Random forest with under-sampling performs best.
G-mean effectively evaluates minority class prediction.
Imbalanced data handling improves credit risk models.
Abstract
Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still controversial. In an attempt to address these problems, this paper presents an empirical comparison of various combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias towards the majority class, which has not been considered in similar studies. The results reveal that combining random forest and random under-sampling may be an effective strategy for calculating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
