TL;DR
This paper introduces a knowledge distillation framework that compresses ensemble models into smaller students, improving accuracy without pretraining and enabling multiple compressed models from a single training process.
Contribution
It proposes a novel ensemble-based knowledge distillation method that allows simultaneous training of multiple compressed models with improved performance.
Findings
Achieved 97% compression with 10.64% accuracy gain on ResNet110.
Achieved 95% compression with 8.17% accuracy gain on DenseNet-BC.
Validated effectiveness on state-of-the-art classification models.
Abstract
This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each model learns unique representations from the data distribution due to its distinct architecture. This helps the ensemble generalize better by combining every model's knowledge. The distilled students and ensemble teacher are trained simultaneously without requiring any pretrained weights. Moreover, our proposed method can deliver multi-compressed students with single training, which is efficient and flexible for different scenarios. We provide comprehensive experiments using state-of-the-art classification models to validate our framework's effectiveness. Notably, using our framework a 97% compressed ResNet110 student model managed to produce a 10.64%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
