Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation

Qiyan Deng; Changqian Zheng; Lianpeng Qiao; Yuping Wang; Chengliang Chai; Lei Cao

arXiv:2510.27253·cs.LG·November 3, 2025

Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation

Qiyan Deng, Changqian Zheng, Lianpeng Qiao, Yuping Wang, Chengliang Chai, Lei Cao

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Influence-Weighted Distillation (IWD), a novel framework that uses influence functions to assign adaptive weights to data instances, improving dataset distillation quality and model performance.

Contribution

The work presents a new influence-based weighting method for dataset distillation that accounts for data quality, enhancing effectiveness over traditional uniform approaches.

Findings

01

Improves distilled dataset quality and model accuracy by up to 7.8%.

02

Integrates seamlessly into existing distillation frameworks.

03

Prioritizes beneficial data, reduces impact of harmful instances.

Abstract

Dataset distillation condenses large datasets into synthetic subsets, achieving performance comparable to training on the full dataset while substantially reducing storage and computation costs. Most existing dataset distillation methods assume that all real instances contribute equally to the process. In practice, real-world datasets contain both informative and redundant or even harmful instances, and directly distilling the full dataset without considering data quality can degrade model performance. In this work, we present Influence-Weighted Distillation IWD, a principled framework that leverages influence functions to explicitly account for data quality in the distillation process. IWD assigns adaptive weights to each instance based on its estimated impact on the distillation objective, prioritizing beneficial data while downweighting less useful or harmful ones. Owing to its…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 3

Strengths

1. The proposed IWD algorithm is light-weight and can be plugged into any existing dataset distillation algorithm. 2. The paper provide adequate mathematical justification towards their design of the influence calculation. 3. The proposed IWD algorithm does improve three existing dataset distillation algorithm on CIFAR-10, CIFAR-100, and SVHN, with the largest on gradient matching from 44.9% to 52.7% with 10 IPC.

Weaknesses

1. While the paper provide a large number of baselines for comparison, the proposed algorithm is only applied to three of the baseline: gradient matching, information-intensive dataset condensation, and progressive dataset distillation. Since the proposed method suppose to augment existing method, more comparison needs to be made. For instance, trajectory matching is the most popular dataset distillation algorithm, yet no evaluation is reported on what happens if we apply IWD on top. 2. Followi

Reviewer 02Rating 4Confidence 4

Strengths

1. The paper addresses a fundamental yet often overlooked problem in dataset distillation, that different training instances contribute unequally to the quality of the distilled dataset. The motivation is clear and intuitive, highlighting a real limitation in existing approaches. 2. The proposed method consistently improves multiple baselines (DC, IDC, PDD) across standard benchmarks (CIFAR10/100, SVHN). These results demonstrate that the influence-based weighting mechanism effectively enhances

Weaknesses

1. The proposed IWD framework relies on costly influence function estimation involving Hessian–vector products and bi-level optimization, leading to heavy computation and poor scalability to large datasets like ImageNet-1k. It is also restricted to bi-level distillation frameworks (e.g., DD, DC, IDC, PDD) due to its dependence on outer-loop gradients, and cannot extend to more efficient uni-level methods (e.g., SRe2L [1], EDC [2], FADRM [3]). In contrast, Prune-then-Distill offers greater flexib

Reviewer 03Rating 4Confidence 5

Strengths

- IWD is designed as a plug-and-play module that can be combined with various distillation paradigms. - The paper is easy to understand.

Weaknesses

- The method proposed in this paper is only tested on CIFAR10, CIFAR100, and SVHN small datasets, and is not tested on ImageNet1k and ImageNet21k. Therefore, the generalization of the proposed method cannot be demonstrated. - This paper does not measure the time and computing cost required to calculate the influence score, so it cannot effectively illustrate the cost of the proposed plug-and-play method. - The proposed method lacks innovation, influence functions have long been used in the field

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Data Quality and Management