DRUPI: Dataset Reduction Using Privileged Information

Shaobo Wang; Youxin Jiang; Tianle Niu; Yantai Yang; Ruiji Zhang; Shuhao Hu; Shuaiyu Zhang; Chenghao Sun; Weiya Li; Conghui He; Xuming Hu; Linfeng Zhang

arXiv:2410.01611·cs.CV·March 11, 2026

DRUPI: Dataset Reduction Using Privileged Information

Shaobo Wang, Youxin Jiang, Tianle Niu, Yantai Yang, Ruiji Zhang, Shuhao Hu, Shuaiyu Zhang, Chenghao Sun, Weiya Li, Conghui He, Xuming Hu, Linfeng Zhang

PDF

Open Access 5 Reviews

TL;DR

This paper introduces DCPI, a dataset condensation method that synthesizes privileged information like feature or attention labels alongside reduced datasets, leading to improved model training and performance on large image datasets.

Contribution

The paper proposes a novel dataset condensation approach that incorporates privileged information, enhancing existing methods and demonstrating significant performance improvements.

Findings

01

Effective feature labels must balance discrimination and diversity.

02

DCPI significantly improves performance on ImageNet-1K, CIFAR-10/100, and Tiny ImageNet.

03

Privileged information enhances dataset condensation efficacy.

Abstract

Dataset Condensation (DC) seeks to select or distill samples from large datasets into smaller subsets while preserving performance on target tasks. Existing methods primarily focus on pruning or synthesizing data in the same format as the original dataset, typically being the input data and corresponding labels. However, in DC settings, we find it is possible to synthesize more information beyond the data-label pair as an additional learning target to facilitate model training. In this paper, we introduce Dataset Condensation using Privileged Information (DCPI), which enriches DC by synthesizing privileged information alongside the reduced dataset. This privileged information can take the form of feature labels or attention labels, providing auxiliary supervision to improve model learning. Our findings reveal that effective feature labels must balance between being overly discriminative…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 5

Strengths

1. The use of feature labels and attention labels as privileged information is straightforward and easy to implement.

Weaknesses

1. The paper does not sufficiently acknowledge prior work, for instance, it should compare and discuss the proposed method with previous dataset distillation methods that use data-soft label structures, which are similar in approach. 2. The novelty of the privileged information (feature and attention labels derived from feature labels) is limited and similar to approaches such as Re-labeling ImageNet and FKD. 3. The theoretical analysis using VC theory is general and not specific to the proposed

Reviewer 02Rating 5Confidence 4

Strengths

[S1] The paper presents the use of privileged information, such as feature and attention labels, to improve dataset reduction, demonstrating performance gains on datasets like CIFAR-10/100 and ImageNet. [S2] The approach integrates well with existing dataset reduction methods, highlighting its flexibility. [S3] It is supported by a solid theoretical foundation based on VC theory, which explains how privileged information enhances generalization and learning. [S4] The paper is well-written and

Weaknesses

[W1] The paper lacks a thorough analysis of computational costs, which is critical for assessing the method’s efficiency. I suggest reporting specific metrics such as training time, memory usage, and storage overhead when generating and storing privileged information. It would be helpful to compare these costs with baseline methods to provide a clear understanding of DRUPI’s performance in terms of resource consumption. This analysis will clarify the computational trade-offs involved in using pr

Reviewer 03Rating 5Confidence 4

Strengths

1. All the charts in the text are clear and easy to understand. 2. The theoretical analysis further supports the proposed method 3. The distilled dataset can generalize well across various architectures

Weaknesses

1. Some details of the method are not clarified, especially on the pruning part. The authors described the algorithm in the context of DC but have yet to explain how pruning can incorporate learnable feature labels. 2. Analysis in section 5 is trivial and **doesn't** provide enough insights. More discussion can be included. 3. The scalability on **large** IPCs may not be good enough according to the reported results in Table 2 and Appendix. For the weaknesses mentioned above, I'll raise the cor

Reviewer 04Rating 5Confidence 4

Strengths

The introduction of feature labels as privileged information goes beyond the traditional data-label paradigm, providing additional supervision that improves model generalization and robustness. The paper conducts an extensive set of experiments across a variety of datasets (CIFAR-10/100, Tiny ImageNet, and ImageNet subsets) and methods (coreset selection and dataset distillation), clearly demonstrating the efficacy of DRUPI in improving performance. The authors provide a rigorous theoretical f

Weaknesses

DRUPI uses the output of a pre-trained model as privileged information and further applies an MSE loss to enforce consistency between the model’s predictions and the privileged information. This approach shares similarities with knowledge distillation techniques. While the method is straightforward and yields competitive results, it inherently limits the upper bound of DRUPI's performance to the quality of the pre-trained model. Consequently, if the pre-trained model is suboptimal, the privilege

Reviewer 05Rating 3Confidence 4

Strengths

1. The core contribution of this paper is the introduction of additional supervised information, specifically privileged information, alongside input data and soft labels. This paradigm effectively extends the capabilities of the dataset condensation algorithm. 2. The authors validate DRUPI on multiple datasets and demonstrate its integration with several classical dataset condensation algorithms, confirming its generalizability to some extent.

Weaknesses

Overall, the biggest drawback of this work is what I mentioned in Summary, the generalization of this paradigm, does it violate the concept of dataset condensation? Is the additional storage overhead introduced acceptable? Besides that, some expressions in the paper need to be improved. **Presentation:** 1. The concept of dataset distillation is misunderstood. The authors classify dataset reduction into coreset selection and dataset distillation (line 125). However, this raises the question: h

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Machine Learning and Data Classification · Neural Networks and Applications

MethodsSoftmax · Attention Is All You Need · Pruning · Focus