Dual Discriminator Adversarial Distillation for Data-free Model Compression
Haoran Zhao, Xin Sun, Junyu Dong, Hui Yu, Huiyu Zhou

TL;DR
This paper introduces a data-free knowledge distillation method called Dual Discriminator Adversarial Distillation (DDAD) that creates synthetic data to train compact neural networks without access to original training data.
Contribution
The paper proposes a novel data-free distillation approach using dual discriminator adversarial training to generate synthetic data for effective model compression.
Findings
Outperforms existing data-free distillation methods on classification benchmarks.
Effective for semantic segmentation tasks on multiple datasets.
Produces compact models closely matching teacher performance without original data.
Abstract
Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher's intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
MethodsKnowledge Distillation · Batch Normalization
