Up to 100$\times$ Faster Data-free Knowledge Distillation
Gongfan Fang, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei, Zhang, Mingli Song

TL;DR
FastDFKD introduces a meta-synthesizer approach that reuses shared features to accelerate data-free knowledge distillation by up to 100 times, maintaining competitive performance.
Contribution
The paper proposes a novel meta-synthesizer strategy for rapid data synthesis in DFKD, significantly reducing training time while preserving accuracy.
Findings
Achieves up to 100× faster DFKD training.
Maintains comparable performance to state-of-the-art methods.
Validates effectiveness on CIFAR, NYUv2, and ImageNet datasets.
Abstract
Data-free knowledge distillation (DFKD) has recently been attracting increasing attention from research communities, attributed to its capability to compress a model only using synthetic data. Despite the encouraging results achieved, state-of-the-art DFKD methods still suffer from the inefficiency of data synthesis, making the data-free training process extremely time-consuming and thus inapplicable for large-scale tasks. In this work, we introduce an efficacious scheme, termed as FastDFKD, that allows us to accelerate DFKD by a factor of orders of magnitude. At the heart of our approach is a novel strategy to reuse the shared common features in training data so as to synthesize different data instances. Unlike prior methods that optimize a set of data independently, we propose to learn a meta-synthesizer that seeks common features as the initialization for the fast data synthesis. As…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
MethodsKnowledge Distillation
