Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation
Qiyan Deng, Changqian Zheng, Lianpeng Qiao, Yuping Wang, Chengliang Chai, Lei Cao

TL;DR
This paper introduces Influence-Weighted Distillation (IWD), a novel framework that uses influence functions to assign adaptive weights to data instances, improving dataset distillation quality and model performance.
Contribution
The work presents a new influence-based weighting method for dataset distillation that accounts for data quality, enhancing effectiveness over traditional uniform approaches.
Findings
Improves distilled dataset quality and model accuracy by up to 7.8%.
Integrates seamlessly into existing distillation frameworks.
Prioritizes beneficial data, reduces impact of harmful instances.
Abstract
Dataset distillation condenses large datasets into synthetic subsets, achieving performance comparable to training on the full dataset while substantially reducing storage and computation costs. Most existing dataset distillation methods assume that all real instances contribute equally to the process. In practice, real-world datasets contain both informative and redundant or even harmful instances, and directly distilling the full dataset without considering data quality can degrade model performance. In this work, we present Influence-Weighted Distillation IWD, a principled framework that leverages influence functions to explicitly account for data quality in the distillation process. IWD assigns adaptive weights to each instance based on its estimated impact on the distillation objective, prioritizing beneficial data while downweighting less useful or harmful ones. Owing to its…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The proposed IWD algorithm is light-weight and can be plugged into any existing dataset distillation algorithm. 2. The paper provide adequate mathematical justification towards their design of the influence calculation. 3. The proposed IWD algorithm does improve three existing dataset distillation algorithm on CIFAR-10, CIFAR-100, and SVHN, with the largest on gradient matching from 44.9% to 52.7% with 10 IPC.
1. While the paper provide a large number of baselines for comparison, the proposed algorithm is only applied to three of the baseline: gradient matching, information-intensive dataset condensation, and progressive dataset distillation. Since the proposed method suppose to augment existing method, more comparison needs to be made. For instance, trajectory matching is the most popular dataset distillation algorithm, yet no evaluation is reported on what happens if we apply IWD on top. 2. Followi
1. The paper addresses a fundamental yet often overlooked problem in dataset distillation, that different training instances contribute unequally to the quality of the distilled dataset. The motivation is clear and intuitive, highlighting a real limitation in existing approaches. 2. The proposed method consistently improves multiple baselines (DC, IDC, PDD) across standard benchmarks (CIFAR10/100, SVHN). These results demonstrate that the influence-based weighting mechanism effectively enhances
1. The proposed IWD framework relies on costly influence function estimation involving Hessian–vector products and bi-level optimization, leading to heavy computation and poor scalability to large datasets like ImageNet-1k. It is also restricted to bi-level distillation frameworks (e.g., DD, DC, IDC, PDD) due to its dependence on outer-loop gradients, and cannot extend to more efficient uni-level methods (e.g., SRe2L [1], EDC [2], FADRM [3]). In contrast, Prune-then-Distill offers greater flexib
- IWD is designed as a plug-and-play module that can be combined with various distillation paradigms. - The paper is easy to understand.
- The method proposed in this paper is only tested on CIFAR10, CIFAR100, and SVHN small datasets, and is not tested on ImageNet1k and ImageNet21k. Therefore, the generalization of the proposed method cannot be demonstrated. - This paper does not measure the time and computing cost required to calculate the influence score, so it cannot effectively illustrate the cost of the proposed plug-and-play method. - The proposed method lacks innovation, influence functions have long been used in the field
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Data Quality and Management
