Dataset Distillation from First Principles: Integrating Core Information Extraction and Purposeful Learning
Vyacheslav Kungurtsev, Yuanfang Peng, Jianyang Gu, Saeed Vahidian,, Anthony Quinn, Fadwa Idlahcen, Yiran Chen

TL;DR
This paper formalizes dataset distillation (DD) as an optimization problem tied to specific inference tasks, revealing its broad applications and limitations, and demonstrating its potential in medical data merging and physics-informed neural networks.
Contribution
It introduces a formal, task-specific model of DD, enabling better understanding and development of DD methods across diverse applications.
Findings
Formal model of DD tied to inference tasks
Analysis of DD methods' strengths and limitations
Numerical case studies in medical data and physics-informed neural networks
Abstract
Dataset distillation (DD) is an increasingly important technique that focuses on constructing a synthetic dataset capable of capturing the core information in training data to achieve comparable performance in models trained on the latter. While DD has a wide range of applications, the theory supporting it is less well evolved. New methods of DD are compared on a common set of benchmarks, rather than oriented towards any particular learning task. In this work, we present a formal model of DD, arguing that a precise characterization of the underlying optimization problem must specify the inference task associated with the application of interest. Without this task-specific focus, the DD problem is under-specified, and the selection of a DD algorithm for a particular task is merely heuristic. Our formalization reveals novel applications of DD across different modeling environments. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification
MethodsSparse Evolutionary Training
