Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu, Songhua Liu, Zigeng Chen, Jingwen Ye, Xinchao Wang

TL;DR
This paper introduces HeLlO, a label-lightening framework for dataset distillation that generates synthetic labels online, drastically reducing storage needs while maintaining high performance on large-scale datasets.
Contribution
The paper proposes a novel label-lightening approach using open-source foundation models and low-rank fine-tuning to generate synthetic labels efficiently during dataset distillation.
Findings
Achieves comparable performance with only 0.003% storage of original soft labels.
Uses foundation models like CLIP for label generation.
Demonstrates effectiveness on large-scale datasets.
Abstract
Dataset distillation or condensation aims to condense a large-scale training dataset into a much smaller synthetic one such that the training performance of distilled and original sets on neural networks are similar. Although the number of training samples can be reduced substantially, current state-of-the-art methods heavily rely on enormous soft labels to achieve satisfactory performance. As a result, the required storage can be comparable even to original datasets, especially for large-scale ones. To solve this problem, instead of storing these heavy labels, we propose a novel label-lightening framework termed HeLlO aiming at effective image-to-label projectors, with which synthetic labels can be directly generated online from synthetic images. Specifically, to construct such projectors, we leverage prior knowledge in open-source foundation models, e.g., CLIP, and introduce a…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The motivation of this paper is clear and easy to understand. 2. The proposed method is simple and effective.
I am convinced of the motivation of this paper, but there are significant concerns on using CLIP to generate labels from images: 1. Why not use stronger supervised models or self-supervised models? If we can use another stronger model like CLIP to generate labels from images, why not use it directly for the target task, or replace CLIP (weakly-supervised) with other models like supervised ViT or self-supervised DINOv2? Authors finetune CLIP vision encoder with LoRA, and text embeddings to adapt
The work effectively tackles the critical issue of high storage requirements for soft labels.
- The comparison presented in Table 1 appears unfair. Unlike the baseline methods, the proposed method requires storing a teacher model (or its low rank version), which complicates a direct comparison. It is unclear why the storage size of the teacher model is relevant here. Please provide a more comprehensive comparison that includes the storage requirements for both the teacher model and the labels across all methods. - Table 1 contains numerous missing values, which hinders a comprehensive co
I do think this paper approaches a problem that isn't really being avidly researched in the field of data distillation. In particular, most works aim to reduce the costs associated with the number of images, or the learning paradigms for distillation. However, this work looks at an add on to existing distillation techniques in an effort to reduce the storage requirements of the labels. From a novelty perspective, I would agree that this paper incorporates new or innovative techniques into solvin
Despite the interesting approach taken in this paper, I find there to be a few crucial weaknesses that may overshadow the benefit of the approach. Data Distillation computational costs are often computed regarding the size/number of images as these often take up more storage space than the labels. Understandably at lower image/class ratios, this distribution may deviate (as soft labels would remain scaled at the number of classes, say 1K on ImageNet) -- however I am not convinced that the storag
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
Methodsbye · Sparse Evolutionary Training · Contrastive Language-Image Pre-training
