Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation
Timur Sattarov, Marco Schreyer, Damian Borth

TL;DR
DP-FinDiff is a novel differentially private diffusion framework that synthesizes high-quality mixed-type tabular data by employing embedding-based representations and privacy-aware training strategies, improving utility while maintaining privacy.
Contribution
The paper introduces DP-FinDiff, a new differentially private diffusion model with embedding-based categorical features and two innovative privacy-aware training strategies.
Findings
Achieves 16-42% higher utility than DP baselines on financial and medical datasets.
Effectively scales to high-dimensional mixed-type tabular data.
Maintains strong privacy guarantees while improving data utility.
Abstract
We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two privacy-aware training strategies: an adaptive timestep sampler that aligns updates with diffusion dynamics, and a feature-aggregated loss that mitigates clipping-induced bias. Together, these enhancements improve fidelity and downstream utility without weakening privacy guarantees. On financial and medical datasets, DP-FinDiff achieves 16-42% higher utility than DP baselines at comparable privacy levels, demonstrating its promise for safe and effective data sharing in sensitive domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Big Data and Digital Economy
