Going Beyond Feature Similarity: Effective Dataset Distillation based on Class-Aware Conditional Mutual Information
Xinhao Zhong, Bin Chen, Hao Fang, Xulin Gu, Shu-Tao Xia, En-Hui Yang

TL;DR
This paper introduces a class-aware conditional mutual information approach to improve dataset distillation, resulting in more learnable synthetic datasets that enhance training efficiency and performance.
Contribution
It proposes minimizing class-aware conditional mutual information to better regularize dataset distillation, addressing the difficulty of synthetic datasets for neural network learning.
Findings
Improved training efficiency with smaller synthetic datasets.
Enhanced performance of models trained on distilled datasets.
Versatile regularization method applicable to existing DD techniques.
Abstract
Dataset distillation (DD) aims to minimize the time and memory consumption needed for training deep neural networks on large datasets, by creating a smaller synthetic dataset that has similar performance to that of the full real dataset. However, current dataset distillation methods often result in synthetic datasets that are excessively difficult for networks to learn from, due to the compression of a substantial amount of information from the original data through metrics measuring feature similarity, e,g., distribution matching (DM). In this work, we introduce conditional mutual information (CMI) to assess the class-aware complexity of a dataset and propose a novel method by minimizing CMI. Specifically, we minimize the distillation loss while constraining the class-aware complexity of the synthetic dataset by minimizing its empirical CMI from the feature space of pre-trained…
Peer Reviews
Decision·ICLR 2025 Poster
1. The idea of using CMI in dataset distillation to address the inherent class-aware complexity issue is interesting. 2. The experiments are conducted based on multiple datasets and various model architectures, providing solid evidence for the method's effectiveness. 3. The proposed method CMI is a versatile, "plug-and-play" regularization component that can be applied to numerous dataset distillation methods, such as DSA, MTT, and IDC. This flexibility allows the approach to generalize across
1. While the paper demonstrates the CMI constraint’s benefits clearly, this method also introduces additional computation overhead, especially when dealing with high-resolution datasets. Although the authors briefly mention several strategies for mitigating this cost (e.g., reducing CMI calculations frequency), a more thorough discussion on balancing cost and performance might strengthen the practical feasibility. 2. Although empirical evidence is strong, the theoretical basis for CMI as a regul
1. The proposed CMI method is a relatively simple yet effective approach that is plug-and-play in nature. It has demonstrated its effectiveness across multiple baseline methods. 2. The motivation behind the method proposed in the paper is solid and is supported by a certain theoretical foundation. 3. The experiments in the paper are comprehensive, conducted across various scales of datasets.
1. There are now newer and more powerful methods available, such as "Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching" (ICLR 2024). The authors could consider experimenting with their proposed method on these methods. 2. The description of the method in the paper could be clearer, particularly regarding the explanation of the formula symbols, to better emphasize the key points of the approach. Currently, it appears somewhat ambiguous. 3. In my view, using mutual i
The strengths of this paper lie in its comprehensive experimentation across diverse datasets and network architectures, which effectively demonstrates the versatility and robustness of the proposed method. Furthermore, the method's ability to be integrated as a plug-and-play module into existing dataset distillation techniques, regardless of their optimization objectives, showcases its innovation and flexibility, making it a significant contribution to the field.
The paper lacks a clear discussion of the limitations of the proposed method. Furthermore, the authors should consider using more intuitive explanations, visual aids, and pseudocode to help readers better understand the technical details of the method.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition
MethodsSparse Evolutionary Training
