MIM4DD: Mutual Information Maximization for Dataset Distillation

Yuzhang Shang; Zhihang Yuan; Yan Yan

arXiv:2312.16627·cs.LG·December 29, 2023·1 cites

MIM4DD: Mutual Information Maximization for Dataset Distillation

Yuzhang Shang, Zhihang Yuan, Yan Yan

PDF

Open Access 1 Repo

TL;DR

MIM4DD introduces a mutual information maximization approach for dataset distillation, improving synthetic data quality by explicitly measuring shared information, and can enhance existing state-of-the-art methods.

Contribution

It proposes using mutual information as a metric for dataset distillation and develops a contrastive learning framework to maximize MI between real and synthetic datasets.

Findings

01

MIM4DD effectively enhances dataset distillation performance.

02

The method is compatible as an add-on to existing techniques.

03

Experimental results demonstrate improved test accuracy.

Abstract

Dataset distillation (DD) aims to synthesize a small dataset whose test performance is comparable to a full dataset using the same model. State-of-the-art (SoTA) methods optimize synthetic datasets primarily by matching heuristic indicators extracted from two networks: one from real data and one from synthetic data (see Fig.1, Left), such as gradients and training trajectories. DD is essentially a compression problem that emphasizes maximizing the preservation of information contained in the data. We argue that well-defined metrics which measure the amount of shared information between variables in information theory are necessary for success measurement but are never considered by previous works. Thus, we introduce mutual information (MI) as the metric to quantify the shared information between the synthetic and the real datasets, and devise MIM4DD numerically maximizing the MI via a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Guang000/Awesome-Dataset-Distillation
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning