Large-Scale Generative Data-Free Distillation
Liangchen Luo, Mark Sandler, Zi Lin, Andrey Zhmoginov, Andrew Howard

TL;DR
This paper introduces a scalable data-free knowledge distillation method using generative models that leverage teacher network statistics, achieving high performance on large datasets like ImageNet without access to original data.
Contribution
It proposes a novel approach to train generative models for data-free distillation by exploiting normalization layer statistics, enabling scaling to large datasets.
Findings
Achieves 95.02% accuracy on CIFAR-10
Achieves 77.02% accuracy on CIFAR-100
Successfully scales to ImageNet dataset
Abstract
Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this problem, but they are either highly time-consuming or unable to scale to large datasets. To this end, we propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics of the trained teacher network. This enables us to build an ensemble of generators without training data that can efficiently produce substitute inputs for subsequent distillation. The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Seismic Imaging and Inversion Techniques
