Enabling PSO-Secure Synthetic Data Sharing Using Diversity-Aware Diffusion Models

Mischa Dombrowski; Bernhard Kainz

arXiv:2506.17975·cs.CV·June 24, 2025

Enabling PSO-Secure Synthetic Data Sharing Using Diversity-Aware Diffusion Models

Mischa Dombrowski, Bernhard Kainz

PDF

TL;DR

This paper introduces a novel framework for training diffusion models to generate synthetic data that balances high fidelity with privacy protection, ensuring compliance with data regulations like GDPR while maintaining near real-data performance.

Contribution

The work presents a new diversity-aware diffusion model training method that produces privacy-preserving synthetic datasets with high utility and legal compliance.

Findings

01

Synthetic datasets achieve within 1% of real data performance.

02

The method significantly outperforms existing privacy-preserving synthetic data techniques.

03

The approach ensures compliance with GDPR and enhances data sharing in sensitive domains.

Abstract

Synthetic data has recently reached a level of visual fidelity that makes it nearly indistinguishable from real data, offering great promise for privacy-preserving data sharing in medical imaging. However, fully synthetic datasets still suffer from significant limitations: First and foremost, the legal aspect of sharing synthetic data is often neglected and data regulations, such as the GDPR, are largley ignored. Secondly, synthetic models fall short of matching the performance of real data, even for in-domain downstream applications. Recent methods for image generation have focused on maximising image diversity instead of fidelity solely to improve the mode coverage and therefore the downstream performance of synthetic data. In this work, we shift perspective and highlight how maximizing diversity can also be interpreted as protecting natural persons from being singled out, which leads…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.