What is Dataset Distillation Learning?

William Yang; Ye Zhu; Zhiwei Deng; Olga Russakovsky

arXiv:2406.04284·cs.LG·July 23, 2024

What is Dataset Distillation Learning?

William Yang, Ye Zhu, Zhiwei Deng, Olga Russakovsky

PDF

Open Access 1 Repo

TL;DR

This paper investigates the nature of dataset distillation, revealing its limitations, how it retains information, and providing a framework for interpreting the semantic content of synthetic data.

Contribution

It offers new insights into the behavior, representativeness, and information content of distilled data, and introduces a framework for interpretation.

Findings

01

Distilled data cannot replace real data outside standard training.

02

Distillation retains information related to early training dynamics.

03

Individual distilled points contain meaningful semantic information.

Abstract

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used to train high performing models, little is understood about how the information is stored. In this study, we posit and answer three questions about the behavior, representativeness, and point-wise information content of distilled data. We reveal distilled data cannot serve as a substitute for real data during training outside the standard evaluation setting for dataset distillation. Additionally, the distillation process retains high task performance by compressing information related to the early training dynamics of real models. Finally, we provide an framework for interpreting distilled data and reveal that individual distilled data points contain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princetonvisualai/What-is-Dataset-Distillation-Learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification

MethodsSparse Evolutionary Training