Grounding and Enhancing Informativeness and Utility in Dataset Distillation
Shaobo Wang, Yantai Yang, Guo Chen, Peiru Li, Kaixin Li, Yufa Zhou, Zhaorun Chen, Linfeng Zhang

TL;DR
This paper introduces a theoretically grounded framework for dataset distillation that balances informativeness and utility, leading to improved performance on large-scale image datasets.
Contribution
It proposes InfoUtil, a novel method combining game-theoretic and gradient-based techniques to optimize synthetic datasets for better informativeness and utility.
Findings
Achieves 6.1% performance improvement on ImageNet-1K with ResNet-18.
Provides a theoretical foundation for understanding dataset distillation.
Demonstrates effectiveness of combining Shapley Value and Gradient Norm in dataset synthesis.
Abstract
Dataset Distillation (DD) seeks to create a compact dataset from a large, real-world dataset. While recent methods often rely on heuristic approaches to balance efficiency and quality, the fundamental relationship between original and synthetic data remains underexplored. This paper revisits knowledge distillation-based dataset distillation within a solid theoretical framework. We introduce the concepts of Informativeness and Utility, capturing crucial information within a sample and essential samples in the training set, respectively. Building on these principles, we define optimal dataset distillation mathematically. We then present InfoUtil, a framework that balances informativeness and utility in synthesizing the distilled dataset. InfoUtil incorporates two key components: (1) game-theoretic informativeness maximization using Shapley Value attribution to extract key information from…
Peer Reviews
Decision·ICLR 2026 Poster
1. To the best of my knowledge, the proposed method is novel. 2. The work is well motivated as many dataset distillation methods lack theoretical foundations. 3. The proposed method outperforms two existing knowledge distillation-based dataset distillation methods: SRe2L and RDED.
**1.** The storage budget for the resultant distilled dataset is missing in the paper. One of the key problem with knowledge-based dataset distillation methods is that the resulting distilled dataset can become considerably larger than expected due the soft-labeling [1,2], defeating the whole purpose of dataset distillation. With the lack information on storage size, the usefulness of the proposed method in practice is unclear. **2.** The performance in the most standard dataset distillation s
1. Presents a clear theoretical framework that connects informativeness and utility under a unified definition of optimal dataset distillation. 2. The use of Shapley-value–based attribution provides interpretability and theoretical grounding to the distilled sample selection. 3. Demonstrates strong empirical performance across multiple datasets and architectures, showing robustness and scalability. 4. Ablation studies and cross-architecture evaluations validate each component’s contribution and
1. Although informativeness is formally defined as an optimization over binary masks (Eq. 2), the method substitutes this process with a Shapley-value attribution heuristic. The paper does not show that Shapley-based patch selection approximates or lower-bounds the true informativeness objective, nor does it provide conditions under which this substitution is valid. This leap from optimization to attribution lacks formal grounding, especially given nonlinear interactions among image regions. 2.
The strength of this work lies in its solid theoretical foundation and practical effectiveness. Unlike prior heuristic or empirical approaches, InfoUtil unifies the concepts of informativeness and utility within a principled mathematical framework, providing interpretability and transparency to dataset distillation. Its combination of game-theoretic Shapley value attribution and gradient norm-based sample selection ensures that the distilled data are both highly informative and influential for m
Despite its strong theoretical grounding and empirical performance, the paper contains a few minor weaknesses. There are some typo errors. 1. In lines 89–90, where the phrase "we reconsider the knowledge distillation-based dataset distillation process by introducing Principled Dataset Distillation (Definition 4)" should read "Optimal Dataset Distillation" to maintain consistency with the terminology used throughout the paper. 2. Similarly, in line 323, the word "nclude" in “Baseline nclude tr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
