Heavy Labels Out! Dataset Distillation with Label Space Lightening

Ruonan Yu; Songhua Liu; Zigeng Chen; Jingwen Ye; Xinchao Wang

arXiv:2408.08201·cs.CV·August 16, 2024

Heavy Labels Out! Dataset Distillation with Label Space Lightening

Ruonan Yu, Songhua Liu, Zigeng Chen, Jingwen Ye, Xinchao Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces HeLlO, a label-lightening framework for dataset distillation that generates synthetic labels online, drastically reducing storage needs while maintaining high performance on large-scale datasets.

Contribution

The paper proposes a novel label-lightening approach using open-source foundation models and low-rank fine-tuning to generate synthetic labels efficiently during dataset distillation.

Findings

01

Achieves comparable performance with only 0.003% storage of original soft labels.

02

Uses foundation models like CLIP for label generation.

03

Demonstrates effectiveness on large-scale datasets.

Abstract

Dataset distillation or condensation aims to condense a large-scale training dataset into a much smaller synthetic one such that the training performance of distilled and original sets on neural networks are similar. Although the number of training samples can be reduced substantially, current state-of-the-art methods heavily rely on enormous soft labels to achieve satisfactory performance. As a result, the required storage can be comparable even to original datasets, especially for large-scale ones. To solve this problem, instead of storing these heavy labels, we propose a novel label-lightening framework termed HeLlO aiming at effective image-to-label projectors, with which synthetic labels can be directly generated online from synthetic images. Specifically, to construct such projectors, we leverage prior knowledge in open-source foundation models, e.g., CLIP, and introduce a…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

1. The motivation of this paper is clear and easy to understand. 2. The proposed method is simple and effective.

Weaknesses

I am convinced of the motivation of this paper, but there are significant concerns on using CLIP to generate labels from images: 1. Why not use stronger supervised models or self-supervised models? If we can use another stronger model like CLIP to generate labels from images, why not use it directly for the target task, or replace CLIP (weakly-supervised) with other models like supervised ViT or self-supervised DINOv2? Authors finetune CLIP vision encoder with LoRA, and text embeddings to adapt

Reviewer 02Rating 5Confidence 3

Strengths

The work effectively tackles the critical issue of high storage requirements for soft labels.

Weaknesses

- The comparison presented in Table 1 appears unfair. Unlike the baseline methods, the proposed method requires storing a teacher model (or its low rank version), which complicates a direct comparison. It is unclear why the storage size of the teacher model is relevant here. Please provide a more comprehensive comparison that includes the storage requirements for both the teacher model and the labels across all methods. - Table 1 contains numerous missing values, which hinders a comprehensive co

Reviewer 03Rating 6Confidence 5

Strengths

I do think this paper approaches a problem that isn't really being avidly researched in the field of data distillation. In particular, most works aim to reduce the costs associated with the number of images, or the learning paradigms for distillation. However, this work looks at an add on to existing distillation techniques in an effort to reduce the storage requirements of the labels. From a novelty perspective, I would agree that this paper incorporates new or innovative techniques into solvin

Weaknesses

Despite the interesting approach taken in this paper, I find there to be a few crucial weaknesses that may overshadow the benefit of the approach. Data Distillation computational costs are often computed regarding the size/number of images as these often take up more storage space than the labels. Understandably at lower image/class ratios, this distribution may deviate (as soft labels would remain scaled at the number of classes, say 1K on ImageNet) -- however I am not convinced that the storag

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification

Methodsbye · Sparse Evolutionary Training · Contrastive Language-Image Pre-training