Going Beyond Feature Similarity: Effective Dataset Distillation based on Class-Aware Conditional Mutual Information

Xinhao Zhong; Bin Chen; Hao Fang; Xulin Gu; Shu-Tao Xia; En-Hui Yang

arXiv:2412.09945·cs.CV·May 20, 2025

Going Beyond Feature Similarity: Effective Dataset Distillation based on Class-Aware Conditional Mutual Information

Xinhao Zhong, Bin Chen, Hao Fang, Xulin Gu, Shu-Tao Xia, En-Hui Yang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a class-aware conditional mutual information approach to improve dataset distillation, resulting in more learnable synthetic datasets that enhance training efficiency and performance.

Contribution

It proposes minimizing class-aware conditional mutual information to better regularize dataset distillation, addressing the difficulty of synthetic datasets for neural network learning.

Findings

01

Improved training efficiency with smaller synthetic datasets.

02

Enhanced performance of models trained on distilled datasets.

03

Versatile regularization method applicable to existing DD techniques.

Abstract

Dataset distillation (DD) aims to minimize the time and memory consumption needed for training deep neural networks on large datasets, by creating a smaller synthetic dataset that has similar performance to that of the full real dataset. However, current dataset distillation methods often result in synthetic datasets that are excessively difficult for networks to learn from, due to the compression of a substantial amount of information from the original data through metrics measuring feature similarity, e,g., distribution matching (DM). In this work, we introduce conditional mutual information (CMI) to assess the class-aware complexity of a dataset and propose a novel method by minimizing CMI. Specifically, we minimize the distillation loss while constraining the class-aware complexity of the synthetic dataset by minimizing its empirical CMI from the feature space of pre-trained…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. The idea of using CMI in dataset distillation to address the inherent class-aware complexity issue is interesting. 2. The experiments are conducted based on multiple datasets and various model architectures, providing solid evidence for the method's effectiveness. 3. The proposed method CMI is a versatile, "plug-and-play" regularization component that can be applied to numerous dataset distillation methods, such as DSA, MTT, and IDC. This flexibility allows the approach to generalize across

Weaknesses

1. While the paper demonstrates the CMI constraint’s benefits clearly, this method also introduces additional computation overhead, especially when dealing with high-resolution datasets. Although the authors briefly mention several strategies for mitigating this cost (e.g., reducing CMI calculations frequency), a more thorough discussion on balancing cost and performance might strengthen the practical feasibility. 2. Although empirical evidence is strong, the theoretical basis for CMI as a regul

Reviewer 02Rating 6Confidence 4

Strengths

1. The proposed CMI method is a relatively simple yet effective approach that is plug-and-play in nature. It has demonstrated its effectiveness across multiple baseline methods. 2. The motivation behind the method proposed in the paper is solid and is supported by a certain theoretical foundation. 3. The experiments in the paper are comprehensive, conducted across various scales of datasets.

Weaknesses

1. There are now newer and more powerful methods available, such as "Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching" (ICLR 2024). The authors could consider experimenting with their proposed method on these methods. 2. The description of the method in the paper could be clearer, particularly regarding the explanation of the formula symbols, to better emphasize the key points of the approach. Currently, it appears somewhat ambiguous. 3. In my view, using mutual i

Reviewer 03Rating 6Confidence 3

Strengths

The strengths of this paper lie in its comprehensive experimentation across diverse datasets and network architectures, which effectively demonstrates the versatility and robustness of the proposed method. Furthermore, the method's ability to be integrated as a plug-and-play module into existing dataset distillation techniques, regardless of their optimization objectives, showcases its innovation and flexibility, making it a significant contribution to the field.

Weaknesses

The paper lacks a clear discussion of the limitations of the proposed method. Furthermore, the authors should consider using more intuitive explanations, visual aids, and pseudocode to help readers better understand the technical details of the method.

Code & Models

Repositories

ndhg1213/cmidd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition

MethodsSparse Evolutionary Training