Assessment of creditworthiness models privacy-preserving training with   synthetic data

Ricardo Mu\~noz-Cancino; Cristi\'an Bravo; Sebasti\'an A. R\'ios; and Manuel Gra\~na

arXiv:2301.01212·q-fin.RM·January 4, 2023

Assessment of creditworthiness models privacy-preserving training with synthetic data

Ricardo Mu\~noz-Cancino, Cristi\'an Bravo, Sebasti\'an A. R\'ios, and Manuel Gra\~na

PDF

TL;DR

This paper evaluates the effectiveness of privacy-preserving synthetic data for training credit scoring models, showing modest performance drops but enabling privacy and data access improvements.

Contribution

It introduces a methodology to assess creditworthiness models trained on synthetic data and compares their performance to real-data models.

Findings

01

Synthetic data quality decreases as attribute count increases

02

Models trained on synthetic data show a 3% reduction in AUC

03

Models trained on synthetic data show a 6% reduction in KS

Abstract

Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate the performance of models trained with synthetic data when they are applied to real-world data. Our results show that synthetic data quality is increasingly poor when the number of attributes increases. However, creditworthiness assessment models trained with synthetic data show a reduction of 3\% of AUC and 6\% of KS when compared with models trained with real data. These results have a significant impact since they encourage credit risk investigation from synthetic data, making it possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.