Large-Scale Generative Data-Free Distillation

Liangchen Luo; Mark Sandler; Zi Lin; Andrey Zhmoginov; Andrew Howard

arXiv:2012.05578·cs.LG·December 11, 2020·28 cites

Large-Scale Generative Data-Free Distillation

Liangchen Luo, Mark Sandler, Zi Lin, Andrey Zhmoginov, Andrew Howard

PDF

Open Access

TL;DR

This paper introduces a scalable data-free knowledge distillation method using generative models that leverage teacher network statistics, achieving high performance on large datasets like ImageNet without access to original data.

Contribution

It proposes a novel approach to train generative models for data-free distillation by exploiting normalization layer statistics, enabling scaling to large datasets.

Findings

01

Achieves 95.02% accuracy on CIFAR-10

02

Achieves 77.02% accuracy on CIFAR-100

03

Successfully scales to ImageNet dataset

Abstract

Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this problem, but they are either highly time-consuming or unable to scale to large datasets. To this end, we propose a new method to train a generative image model by leveraging the intrinsic normalization layers' statistics of the trained teacher network. This enables us to build an ensemble of generators without training data that can efficiently produce substitute inputs for subsequent distillation. The proposed method pushes forward the data-free distillation performance on CIFAR-10 and CIFAR-100…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Seismic Imaging and Inversion Techniques