Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Sheng-Feng Yu; Jia-Jiun Yao; and Wei-Chen Chiu

arXiv:2507.21455·cs.CV·August 6, 2025

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation

Sheng-Feng Yu, Jia-Jiun Yao, and Wei-Chen Chiu

PDF

1 Video 3 Reviews

TL;DR

This paper introduces a novel self-supervised dataset distillation method that employs parameterization, predefined augmentation, and approximation techniques to create compact, representative datasets with improved generalization and transfer learning capabilities.

Contribution

It proposes a new approach for self-supervised dataset distillation using innovative parameterization, fixed augmentations, and a lightweight model to enhance dataset compactness and cross-architecture generalization.

Findings

01

Outperforms existing methods in distillation efficiency

02

Improves cross-architecture generalization

03

Enhances transfer learning performance

Abstract

Although larger datasets are crucial for training large deep models, the rapid growth of dataset size has brought a significant challenge in terms of considerable training costs, which even results in prohibitive computational expenses. Dataset Distillation becomes a popular technique recently to reduce the dataset size via learning a highly compact set of representative exemplars, where the model trained with these exemplars ideally should have comparable performance with respect to the one trained with the full dataset. While most of existing works upon dataset distillation focus on supervised datasets, we instead aim to distill images and their self-supervisedly trained representations into a distilled set. This procedure, named as Self-Supervised Dataset Distillation, effectively extracts rich information from real datasets, yielding the distilled sets with enhanced…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 5

Strengths

The key strengths of this paper include: 1. More diverse datasets: Not many dataset distillation papers venture beyond the CIFAR/ImageNet datasets, however these authors included results on CUB2011 and StanfordDogs. Additionally, the ViT performance has been reported, and overall it appears that the authors performance improvement is maintained on Transformer architectures, albeit smaller. 2. The basis and coefficient initialization ablation provides interesting insight into the sensitivity of

Weaknesses

Despite the interesting approach taken in this work, I find a few crucial weaknesses: 1. I find that the experimental support is a bit lacking. As is common in Dataset Distillation works, it is generally good practice to show the scaling over different memory budges (N) on various datasets, rather than just a single dataset, in order to show generalizability. 2. I noticed that the resolutions on ImageNet scale to 64 x 64 -- however recently, the field has shifted to higher resolutions such as 1

Reviewer 02Rating 8Confidence 3

Strengths

1. This paper demonstrated a very strategic parameterization. The use of bases for image and representation parameterization is a sophisticated approach to compress dataset information without sacrificing accuracy. This addresses both storage efficiency and computational cost. 2.Effective Augmentation Handling: By predefining augmentations, the method successfully mitigates the bias introduced by random augmentations, a notable challenge in SSL distillation methods. 3. Improved Memory Efficie

Weaknesses

1. Complexity and accessibility Critique: The method involves several sophisticated techniques, including low-dimensional basis parameterization, predefined augmentations, and approximation networks. This complexity may make it difficult for practitioners to implement and tune the method without extensive expertise in self-supervised learning and dataset distillation. 2. Computational and memory trade-Offs Critique: While the method claims to be memory-efficient due to approximation networks,

Reviewer 03Rating 6Confidence 3

Strengths

1. The topic is both valuable and practical, especially in the era of large datasets. While most current research on data distillation focuses primarily on classification tasks, which may be too narrow, this work seeks to improve self-supervised tasks. This approach is more general and can better support feature learning for downstream applications. 2. The paper is well-written and easy to follow, with a straightforward method that is simple to understand. For each component, the authors clearl

Weaknesses

I did not find any major weaknesses in this paper. However, there are some concerns regarding its novelty. The techniques employed are largely derived from previous work on data distillation for classification tasks. It would be helpful if the authors could clarify what unique challenges exist for self-supervised data distillation and how their method specifically addresses those challenges.

Videos

Boost Self-Supervised Dataset Distillation via Parameterization, Predefined Augmentation, and Approximation· slideslive