Towards Principled Dataset Distillation: A Spectral Distribution Perspective
Ruixi Wu, Shaobo Wang, Jiahuan Chen, Zhiyuan Liu, Yicun Yang, Zhaorun Chen, Zekai Li, Kaixin Li, Xinming Wang, Hongzhu Yi, Kai Wang, Linfeng Zhang

TL;DR
This paper introduces a spectral distribution matching approach for dataset distillation that effectively handles class imbalance and long-tailed data, leading to improved synthetic dataset quality and stability.
Contribution
We propose Class-Aware Spectral Distribution Matching (CSDM), a novel spectral approach that addresses distribution discrepancy and class imbalance in dataset distillation.
Findings
Achieves 14.0% improvement on CIFAR-10-LT over state-of-the-art methods.
Maintains only 5.7% performance drop when tail class images decrease from 500 to 25.
Demonstrates strong stability on long-tailed datasets.
Abstract
Dataset distillation (DD) aims to compress large-scale datasets into compact synthetic counterparts for efficient model training. However, existing DD methods exhibit substantial performance degradation on long-tailed datasets. We identify two fundamental challenges: heuristic design choices for distribution discrepancy measure and uniform treatment of imbalanced classes. To address these limitations, we propose Class-Aware Spectral Distribution Matching (CSDM), which reformulates distribution alignment via the spectrum of a well-behaved kernel function. This technique maps the original samples into frequency space, resulting in the Spectral Distribution Distance (SDD). To mitigate class imbalance, we exploit the unified form of SDD to perform amplitude-phase decomposition, which adaptively prioritizes the realism in tail classes. On CIFAR-10-LT, with 10 images per class, CSDM achieves…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The point raised that many distribution matching methods employ linear kernel, which fail to satisfy universality, is a theoretical limitation of considerable merit. 2. This manuscript is well written and easy to reading. Furthermore, the core idea of CSDM is simple and intuitive, making it easy to understand and apply. 3. In high imbalanced factor experiments, CSDM achieved significant performance gap compared to the baseline, which constitutes highly appropriate results for long-tailed da
1. Theoretical and experimental comparisons with several previous studies[1,2,3,4,5] that performed distribution matching considering higher moments are lacking. In particular, there is insufficient discussion regarding M3D[1] and IID[2] despite being mentioned in this paper. Furthermore, the only experimental comparison with the core baseline, NCFM[4], is on the long-tailed CIFAR dataset. Therefore, there is insufficient justification to develop the argument solely based on the shortcomings of
By assigning class-specific weights to these components, CSDM emphasizes realistic fidelity for under-represented tail classes while maintaining diversity for head classes, thus dynamically handling class imbalance. This principled metric design allows the distilled synthetic set to better capture the overall data distribution, especially the rare classes that previous approaches synthesized poorly. Empirically, CSDM shows substantial gains: on the severely imbalanced CIFAR-10-LT benchmark (imb
One potential concern is the added complexity of designing and computing the spectral kernel embedding, but the authors argue that the Fourier-based implementation is efficient. It would be informative if author provide pseudo-code or big-O notation about fourier-related computation. Although this paper nicely derive spectral distribution-based dataset distillation, this paper lacks extensive search and discussions with existing methods. For example, FreD (https://arxiv.org/abs/2311.08819) prim
The paper identifies two important limitations of existing distribution matching approaches for dataset distillation: the use of linear kernels for MMD and treating all classes the same in imbalanced datasets. The first of these problems is not a novel observation (see [1]), but the proposed solution of using universal nonlinear kernels is a very natural and sensible one, and I am actually a bit surprised I was not able to find this explicitly in the existing literature. The proposed method is a
While I believe the general idea proposed is valuable, I have a few concerns related to the comparison with NCFM [1] and some vague/imprecise claims made in the paper. Given clarifications on these points and writing improvements (see Questions below), I would be willing to raise my score. - Comparison to NCFM: As stated in the paper, the main difference between these methods is that CSDM performs characteristic function matching with respect to the spectral measure of a universal kernel, rath
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
