Data-to-Model Distillation: Data-Efficient Learning Framework

Ahmad Sajedi; Samir Khaki; Lucy Z. Liu; Ehsan Amjadian; Yuri A.; Lawryshyn; Konstantinos N. Plataniotis

arXiv:2411.12841·cs.CV·November 21, 2024

Data-to-Model Distillation: Data-Efficient Learning Framework

Ahmad Sajedi, Samir Khaki, Lucy Z. Liu, Ehsan Amjadian, Yuri A., Lawryshyn, Konstantinos N. Plataniotis

PDF

Open Access 1 Repo

TL;DR

This paper introduces Data-to-Model Distillation (D2M), a novel framework that efficiently distills large datasets into a generative model's parameters, enabling scalable, high-quality synthetic data generation for various architectures and applications.

Contribution

The paper presents a new data-to-model distillation approach that improves efficiency, scalability, and generalizability over existing dataset distillation methods by embedding knowledge into a generative model.

Findings

01

D2M outperforms existing methods on 15 datasets across resolutions.

02

It scales effectively to high-resolution 128x128 ImageNet-1K.

03

D2M benefits neural architecture search with practical downstream applications.

Abstract

Dataset distillation aims to distill the knowledge of a large-scale real dataset into small yet informative synthetic data such that a model trained on it performs as well as a model trained on the full dataset. Despite recent progress, existing dataset distillation methods often struggle with computational efficiency, scalability to complex high-resolution datasets, and generalizability to deep architectures. These approaches typically require retraining when the distillation ratio changes, as knowledge is embedded in raw pixels. In this paper, we propose a novel framework called Data-to-Model Distillation (D2M) to distill the real dataset's knowledge into the learnable parameters of a pre-trained generative model by aligning rich representations extracted from real and generated images. The learned generative model can then produce informative training images for different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DataDistillation/D2M
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Machine Learning and Algorithms · Machine Learning and Data Classification