Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation

Michael Zuo; Inwon Kang; Stacy Patterson; Oshani Seneviratne

arXiv:2602.09288·cs.LG·February 11, 2026

Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation

Michael Zuo, Inwon Kang, Stacy Patterson, Oshani Seneviratne

PDF

Open Access

TL;DR

This paper evaluates the privacy risks and utility tradeoffs of various synthetic data generation methods on financial datasets, highlighting challenges posed by class imbalance and mixed data types.

Contribution

It introduces novel privacy-preserving implementations of GAN and autoencoder synthesizers tailored for financial data and systematically assesses their performance.

Findings

01

GAN and autoencoder methods can balance privacy and utility with domain-specific adaptations

02

Class imbalance significantly affects synthetic data quality and privacy risks

03

Insights into challenges of generating synthetic data with mixed attribute types

Abstract

We explore the privacy-utility tradeoff of synthetic data generation schemes on tabular financial datasets, a domain characterized by high regulatory risk and severe class imbalance. We consider representative tabular data generators, including autoencoders, generative adversarial networks, diffusion, and copula synthesizers. To address the challenges of the financial domain, we provide novel privacy-preserving implementations of GAN and autoencoder synthesizers. We evaluate whether and how well the generators simultaneously achieve data quality, downstream utility, and privacy, with comparison across balanced and imbalanced input datasets. Our results offer insight into the distinct challenges of generating synthetic data from datasets that exhibit severe class imbalance and mixed-type attributes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare