# Predicting credit default probabilities using machine learning   techniques in the face of unequal class distributions

**Authors:** Anna Stelzer

arXiv: 1907.12996 · 2019-07-31

## TL;DR

This paper benchmarks 23 machine learning methods for credit scoring, evaluating their performance across multiple datasets and sampling strategies, highlighting ensemble methods' superiority and the effectiveness of simple sampling techniques.

## Contribution

It provides a comprehensive comparison of various models and sampling strategies for credit default prediction, emphasizing ensemble methods and simple sampling approaches.

## Key findings

- Ensemble methods outperform other models.
- Simple sampling strategies are more effective than complex ones.
- Multiple performance measures confirm the robustness of results.

## Abstract

This study conducts a benchmarking study, comparing 23 different statistical and machine learning methods in a credit scoring application. In order to do so, the models' performance is evaluated over four different data sets in combination with five data sampling strategies to tackle existing class imbalances in the data. Six different performance measures are used to cover different aspects of predictive performance. The results indicate a strong superiority of ensemble methods and show that simple sampling strategies deliver better results than more sophisticated ones.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.12996/full.md

## Figures

20 figures with captions in the complete paper: https://tomesphere.com/paper/1907.12996/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/1907.12996/full.md

---
Source: https://tomesphere.com/paper/1907.12996