A Comprehensive Study on Dataset Distillation: Performance, Privacy,   Robustness and Fairness

Zongxiong Chen; Jiahui Geng; Derui Zhu; Herbert Woisetschlaeger; Qing; Li; Sonja Schimmler; Ruben Mayer; Chunming Rong

arXiv:2305.03355·cs.LG·May 30, 2023·1 cites

A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness

Zongxiong Chen, Jiahui Geng, Derui Zhu, Herbert Woisetschlaeger, Qing, Li, Sonja Schimmler, Ruben Mayer, Chunming Rong

PDF

Open Access

TL;DR

This paper provides a comprehensive evaluation of dataset distillation, highlighting its performance benefits, privacy risks, impact on robustness, and fairness, supported by extensive experiments and benchmarking.

Contribution

It offers the first systematic analysis of security, robustness, and fairness issues in dataset distillation, along with a large-scale benchmarking framework.

Findings

01

Membership inference attacks reveal privacy risks.

02

Dataset distillation affects model robustness variably.

03

It can amplify model unfairness across classes.

Abstract

The aim of dataset distillation is to encode the rich features of an original dataset into a tiny dataset. It is a promising approach to accelerate neural network training and related studies. Different approaches have been proposed to improve the informativeness and generalization performance of distilled images. However, no work has comprehensively analyzed this technique from a security perspective and there is a lack of systematic understanding of potential risks. In this work, we conduct extensive experiments to evaluate current state-of-the-art dataset distillation methods. We successfully use membership inference attacks to show that privacy risks still remain. Our work also demonstrates that dataset distillation can cause varying degrees of impact on model robustness and amplify model unfairness across classes when making predictions. This work offers a large-scale benchmarking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · COVID-19 diagnosis using AI