Assessment of creditworthiness models privacy-preserving training with synthetic data
Ricardo Mu\~noz-Cancino, Cristi\'an Bravo, Sebasti\'an A. R\'ios, and Manuel Gra\~na

TL;DR
This paper evaluates the effectiveness of privacy-preserving synthetic data for training credit scoring models, showing modest performance drops but enabling privacy and data access improvements.
Contribution
It introduces a methodology to assess creditworthiness models trained on synthetic data and compares their performance to real-data models.
Findings
Synthetic data quality decreases as attribute count increases
Models trained on synthetic data show a 3% reduction in AUC
Models trained on synthetic data show a 6% reduction in KS
Abstract
Credit scoring models are the primary instrument used by financial institutions to manage credit risk. The scarcity of research on behavioral scoring is due to the difficult data access. Financial institutions have to maintain the privacy and security of borrowers' information refrain them from collaborating in research initiatives. In this work, we present a methodology that allows us to evaluate the performance of models trained with synthetic data when they are applied to real-world data. Our results show that synthetic data quality is increasingly poor when the number of attributes increases. However, creditworthiness assessment models trained with synthetic data show a reduction of 3\% of AUC and 6\% of KS when compared with models trained with real data. These results have a significant impact since they encourage credit risk investigation from synthetic data, making it possible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
