Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan

TL;DR
This paper introduces novel methods for distilling long-tailed datasets, addressing bias issues in the distillation process to improve tail class performance and create efficient, balanced synthetic datasets.
Contribution
It proposes Distribution-agnostic Matching and Expert Decoupling techniques to effectively distill long-tailed datasets, a first in this research area.
Findings
Reduces bias in synthetic datasets for long-tailed distributions
Improves tail class accuracy through decoupled matching
Pioneers long-tailed dataset distillation methods
Abstract
Dataset distillation aims to synthesize a small, information-rich dataset from a large one for efficient model training. However, existing dataset distillation methods struggle with long-tailed datasets, which are prevalent in real-world scenarios. By investigating the reasons behind this unexpected result, we identified two main causes: 1) The distillation process on imbalanced datasets develops biased gradients, leading to the synthesis of similarly imbalanced distilled datasets. 2) The experts trained on such datasets perform suboptimally on tail classes, resulting in misguided distillation supervision and poor-quality soft-label initialization. To address these issues, we first propose Distribution-agnostic Matching to avoid directly matching the biased expert trajectories. It reduces the distance between the student and the biased expert trajectories and prevents the tail class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Text and Document Classification Technologies
MethodsAttentive Walk-Aggregating Graph Neural Network
