DP-GENG : Differentially Private Dataset Distillation Guided by DP-Generated Data

Shuo Shi; Jinghuai Zhang; Shijie Jiang; Chunyi Zhou; Yuyuan Li; Mengying Zhu; Yangyang Wu; Tianyu Du

arXiv:2511.09876·cs.CR·November 14, 2025

DP-GENG : Differentially Private Dataset Distillation Guided by DP-Generated Data

Shuo Shi, Jinghuai Zhang, Shijie Jiang, Chunyi Zhou, Yuyuan Li, Mengying Zhu, Yangyang Wu, Tianyu Du

PDF

Open Access

TL;DR

This paper introduces extlibn, a novel differentially private dataset distillation framework that leverages DP-generated data to improve utility and privacy, outperforming existing methods in experiments.

Contribution

extlibn innovatively combines DP-generated data with distillation, enhancing realism and utility while maintaining formal privacy guarantees under limited privacy budgets.

Findings

01

extlibn outperforms state-of-the-art DP-DD methods in utility.

02

extlibn provides stronger robustness against membership inference attacks.

03

Theoretical analysis confirms privacy guarantees of extlibn.

Abstract

Dataset distillation (DD) compresses large datasets into smaller ones while preserving the performance of models trained on them. Although DD is often assumed to enhance data privacy by aggregating over individual examples, recent studies reveal that standard DD can still leak sensitive information from the original dataset due to the lack of formal privacy guarantees. Existing differentially private (DP)-DD methods attempt to mitigate this risk by injecting noise into the distillation process. However, they often fail to fully leverage the original dataset, resulting in degraded realism and utility. This paper introduces \libn, a novel framework that addresses the key limitations of current DP-DD by leveraging DP-generated data. Specifically, \lib initializes the distilled dataset with DP-generated data to enhance realism. Then, generated data refines the DP-feature matching technique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Adversarial Robustness in Machine Learning