DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing

Utsab Saha; Tanvir Muntakim Tonoy; Hafiz Imtiaz

arXiv:2411.16121·stat.ML·April 30, 2026

DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing

Utsab Saha, Tanvir Muntakim Tonoy, Hafiz Imtiaz

PDF

TL;DR

DP-CDA is a novel algorithm that enhances privacy in dataset synthesis by using randomized class-specific mixing, providing stronger privacy guarantees and improved utility over existing methods.

Contribution

It introduces a new privacy-preserving data synthesis algorithm that balances privacy and utility through randomized class-specific mixing.

Findings

01

Synthetic data from DP-CDA achieves higher utility than conventional methods.

02

DP-CDA provides stronger privacy guarantees with formal privacy accounting.

03

Optimal mixing order improves privacy-utility trade-off.

Abstract

In recent years, the growth of data across various sectors, including healthcare, security, finance, and education, has created significant opportunities for analysis and informed decision-making. However, these datasets often contain sensitive and personal information, which raises serious privacy concerns. It has been shown in multiple works that a person's identity is intertwined with their data, even if the data is anonymized. Due to this lack of separation between a person's identity and their information, the patterns associated with an individual's information can uniquely identify them. Protecting individual privacy is crucial, yet many existing machine learning and data publishing algorithms struggle with high-dimensional data, facing challenges related to the trade-off between computational efficiency and privacy. To address these challenges, we introduce an effective data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.