TL;DR
This paper introduces a novel self-supervised dataset distillation method that employs parameterization, predefined augmentation, and approximation techniques to create compact, representative datasets with improved generalization and transfer learning capabilities.
Contribution
It proposes a new approach for self-supervised dataset distillation using innovative parameterization, fixed augmentations, and a lightweight model to enhance dataset compactness and cross-architecture generalization.
Findings
Outperforms existing methods in distillation efficiency
Improves cross-architecture generalization
Enhances transfer learning performance
Abstract
Although larger datasets are crucial for training large deep models, the rapid growth of dataset size has brought a significant challenge in terms of considerable training costs, which even results in prohibitive computational expenses. Dataset Distillation becomes a popular technique recently to reduce the dataset size via learning a highly compact set of representative exemplars, where the model trained with these exemplars ideally should have comparable performance with respect to the one trained with the full dataset. While most of existing works upon dataset distillation focus on supervised datasets, we instead aim to distill images and their self-supervisedly trained representations into a distilled set. This procedure, named as Self-Supervised Dataset Distillation, effectively extracts rich information from real datasets, yielding the distilled sets with enhanced…
Peer Reviews
Decision·ICLR 2025 Poster
The key strengths of this paper include: 1. More diverse datasets: Not many dataset distillation papers venture beyond the CIFAR/ImageNet datasets, however these authors included results on CUB2011 and StanfordDogs. Additionally, the ViT performance has been reported, and overall it appears that the authors performance improvement is maintained on Transformer architectures, albeit smaller. 2. The basis and coefficient initialization ablation provides interesting insight into the sensitivity of
Despite the interesting approach taken in this work, I find a few crucial weaknesses: 1. I find that the experimental support is a bit lacking. As is common in Dataset Distillation works, it is generally good practice to show the scaling over different memory budges (N) on various datasets, rather than just a single dataset, in order to show generalizability. 2. I noticed that the resolutions on ImageNet scale to 64 x 64 -- however recently, the field has shifted to higher resolutions such as 1
1. This paper demonstrated a very strategic parameterization. The use of bases for image and representation parameterization is a sophisticated approach to compress dataset information without sacrificing accuracy. This addresses both storage efficiency and computational cost. 2.Effective Augmentation Handling: By predefining augmentations, the method successfully mitigates the bias introduced by random augmentations, a notable challenge in SSL distillation methods. 3. Improved Memory Efficie
1. Complexity and accessibility Critique: The method involves several sophisticated techniques, including low-dimensional basis parameterization, predefined augmentations, and approximation networks. This complexity may make it difficult for practitioners to implement and tune the method without extensive expertise in self-supervised learning and dataset distillation. 2. Computational and memory trade-Offs Critique: While the method claims to be memory-efficient due to approximation networks,
1. The topic is both valuable and practical, especially in the era of large datasets. While most current research on data distillation focuses primarily on classification tasks, which may be too narrow, this work seeks to improve self-supervised tasks. This approach is more general and can better support feature learning for downstream applications. 2. The paper is well-written and easy to follow, with a straightforward method that is simple to understand. For each component, the authors clearl
I did not find any major weaknesses in this paper. However, there are some concerns regarding its novelty. The techniques employed are largely derived from previous work on data distillation for classification tasks. It would be helpful if the authors could clarify what unique challenges exist for self-supervised data distillation and how their method specifically addresses those challenges.
Videos
