Dataset Distillation in Latent Space
Yuxuan Duan, Jianfu Zhang, Liqing Zhang

TL;DR
This paper introduces a novel dataset distillation approach in latent space using autoencoders, reducing computational costs and increasing info-compactness while maintaining performance, enabling efficient distillation of high-resolution datasets.
Contribution
The work pioneers moving dataset distillation from pixel space to latent space, significantly improving efficiency and scalability of existing DD algorithms.
Findings
Reduced time and space complexity in DD methods
Achieved comparable performance to pixel-based methods
Enabled distillation of high-resolution datasets
Abstract
Dataset distillation (DD) is a newly emerging research area aiming at alleviating the heavy computational load in training models on large datasets. It tries to distill a large dataset into a small and condensed one so that models trained on the distilled dataset can perform comparably with those trained on the full dataset when performing downstream tasks. Among the previous works in this area, there are three key problems that hinder the performance and availability of the existing DD methods: high time complexity, high space complexity, and low info-compactness. In this work, we simultaneously attempt to settle these three problems by moving the DD processes from conventionally used pixel space to latent space. Encoded by a pretrained generic autoencoder, latent codes in the latent space are naturally info-compact representations of the original images in much smaller sizes. After…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
This paper shows that, performing the distillation in latent space costs less resources than the distillation in pixel space, without sacrificing much performance.
Most comparisons in this paper are UNFAIR. In dataset distillation area, previous works compare the performance under the same IPC (image per class) settings, which means that the AMOUNT of the distilled images fed into the evaluation network is fixed. This paper proposes ‘LPC’ (latent per class) and claims that 1 IPC=12 LPC since their size are the same (latent codes have lower resolution). Then the authors compare their method’s performance (12*n LPC) with previous works (n IPC), which means
The paper identifies three primary challenges in dataset distillation: high time complexity, high space complexity, and the retention of unnecessary high-frequency information. The authors claim to introduce a pioneering framework that directly addresses these issues by conducting dataset distillation in the latent space, rather than the pixel space.
● The authors assert that they are the first to successfully address these three challenges in dataset distillation. What specific limitations or hindrances have prevented existing works from generalizing solutions to these problems? Is the proposed method the sole solution, or are there alternative approaches that merit consideration? ● I've noticed that the paper exclusively presents performance experiments on the Sub-ImageNet dataset. Given the existence of prior works that have addressed th
This paper is well-motivated, and well-organized. The observation that "distilling dataset in the original space (e.g. pixel space for image datasets) will inevitably condense high-frequency detailed information into limited storage budget, which is usually unnecessary for downstream tasks" is a solid point to serve as motivation for method design. The authors also provide comprehensive experiments on various datasets.
1. While this method has demonstrated its effectiveness for high-resolution dataset distillation, there are no experiments and results comparison on lower resolution datasets such as CIFAR10/100. It leaves concern of whether using an autoencoder from stable diffusion for DD impacts the performance of distillation for such datasets. 2. To my knowledge, coreset selection does not belong to dataset distillation, and dataset distillation usually refers to the optimization based methods that distill
+ The proposed method tackles the DD problem from another angle. The latent code distillation makes a lot of sense in terms of efficiency and can potentially help the field on larger datasets + The authors demonstrate that the proposed method indeed can achieve descent performance with good efficiency + The authors' writing is pretty clear and easy to follow
- The algorithm seems to be heavily depending on the quality of the pretrained autoencoder, causing another layer of complexity in the distillation procedure. - In a more general field, language or other modality, where AEs are not that popular, the proposed method can be limited in terms of contribution or usage. - It seems that the authors only focus on a subset of DD algorithm, how would the latent DD perform using FrePo [1] or momentum-based BPTT [2]? It would be nice if authors can add the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
