Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation

Timur Sattarov; Marco Schreyer; Damian Borth

arXiv:2512.00638·cs.LG·December 2, 2025

Privacy Preserving Diffusion Models for Mixed-Type Tabular Data Generation

Timur Sattarov, Marco Schreyer, Damian Borth

PDF

Open Access

TL;DR

DP-FinDiff is a novel differentially private diffusion framework that synthesizes high-quality mixed-type tabular data by employing embedding-based representations and privacy-aware training strategies, improving utility while maintaining privacy.

Contribution

The paper introduces DP-FinDiff, a new differentially private diffusion model with embedding-based categorical features and two innovative privacy-aware training strategies.

Findings

01

Achieves 16-42% higher utility than DP baselines on financial and medical datasets.

02

Effectively scales to high-dimensional mixed-type tabular data.

03

Maintains strong privacy guarantees while improving data utility.

Abstract

We introduce DP-FinDiff, a differentially private diffusion framework for synthesizing mixed-type tabular data. DP-FinDiff employs embedding-based representations for categorical features, reducing encoding overhead and scaling to high-dimensional datasets. To adapt DP-training to the diffusion process, we propose two privacy-aware training strategies: an adaptive timestep sampler that aligns updates with diffusion dynamics, and a feature-aggregated loss that mitigates clipping-induced bias. Together, these enhancements improve fidelity and downstream utility without weakening privacy guarantees. On financial and medical datasets, DP-FinDiff achieves 16-42% higher utility than DP baselines at comparable privacy levels, demonstrating its promise for safe and effective data sharing in sensitive domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Big Data and Digital Economy