Online Ensemble Model Compression using Knowledge Distillation

Devesh Walawalkar; Zhiqiang Shen; Marios Savvides

arXiv:2011.07449·cs.CV·November 17, 2020

Online Ensemble Model Compression using Knowledge Distillation

Devesh Walawalkar, Zhiqiang Shen, Marios Savvides

PDF

1 Repo

TL;DR

This paper introduces a knowledge distillation framework that compresses ensemble models into smaller students, improving accuracy without pretraining and enabling multiple compressed models from a single training process.

Contribution

It proposes a novel ensemble-based knowledge distillation method that allows simultaneous training of multiple compressed models with improved performance.

Findings

01

Achieved 97% compression with 10.64% accuracy gain on ResNet110.

02

Achieved 95% compression with 8.17% accuracy gain on DenseNet-BC.

03

Validated effectiveness on state-of-the-art classification models.

Abstract

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each model learns unique representations from the data distribution due to its distinct architecture. This helps the ensemble generalize better by combining every model's knowledge. The distilled students and ensemble teacher are trained simultaneously without requiring any pretrained weights. Moreover, our proposed method can deliver multi-compressed students with single training, which is efficient and flexible for different scenarios. We provide comprehensive experiments using state-of-the-art classification models to validate our framework's effectiveness. Notably, using our framework a 97% compressed ResNet110 student model managed to produce a 10.64%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Devwalkar/BOC-KD
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation