Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation
Michael Zuo, Inwon Kang, Stacy Patterson, Oshani Seneviratne

TL;DR
This paper evaluates the privacy risks and utility tradeoffs of various synthetic data generation methods on financial datasets, highlighting challenges posed by class imbalance and mixed data types.
Contribution
It introduces novel privacy-preserving implementations of GAN and autoencoder synthesizers tailored for financial data and systematically assesses their performance.
Findings
GAN and autoencoder methods can balance privacy and utility with domain-specific adaptations
Class imbalance significantly affects synthetic data quality and privacy risks
Insights into challenges of generating synthetic data with mixed attribute types
Abstract
We explore the privacy-utility tradeoff of synthetic data generation schemes on tabular financial datasets, a domain characterized by high regulatory risk and severe class imbalance. We consider representative tabular data generators, including autoencoders, generative adversarial networks, diffusion, and copula synthesizers. To address the challenges of the financial domain, we provide novel privacy-preserving implementations of GAN and autoencoder synthesizers. We evaluate whether and how well the generators simultaneously achieve data quality, downstream utility, and privacy, with comparison across balanced and imbalanced input datasets. Our results offer insight into the distinct challenges of generating synthetic data from datasets that exhibit severe class imbalance and mixed-type attributes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare
