Data-to-Model Distillation: Data-Efficient Learning Framework
Ahmad Sajedi, Samir Khaki, Lucy Z. Liu, Ehsan Amjadian, Yuri A., Lawryshyn, Konstantinos N. Plataniotis

TL;DR
This paper introduces Data-to-Model Distillation (D2M), a novel framework that efficiently distills large datasets into a generative model's parameters, enabling scalable, high-quality synthetic data generation for various architectures and applications.
Contribution
The paper presents a new data-to-model distillation approach that improves efficiency, scalability, and generalizability over existing dataset distillation methods by embedding knowledge into a generative model.
Findings
D2M outperforms existing methods on 15 datasets across resolutions.
It scales effectively to high-resolution 128x128 ImageNet-1K.
D2M benefits neural architecture search with practical downstream applications.
Abstract
Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress, existing dataset distillation methods often struggle with computational efficiency, scalability to complex high-resolution datasets, and generalizability to deep architectures. These approaches typically require retraining when the distillation ratio changes, as knowledge is embedded in raw pixels. In this paper, we propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images. The learned generative model can then produce informative training images for different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Machine Learning and Algorithms · Machine Learning and Data Classification
