Model Compression Using Optimal Transport

Suhas Lohit; Michael Jones

arXiv:2012.03907·cs.CV·December 8, 2020

Model Compression Using Optimal Transport

Suhas Lohit, Michael Jones

PDF

1 Video

TL;DR

This paper introduces a novel model compression technique using optimal transport-based loss functions to improve knowledge distillation, resulting in better performance on image classification tasks.

Contribution

It proposes a new optimal transport-based loss for knowledge distillation, enhancing the alignment of student and teacher feature distributions.

Findings

01

Optimal transport loss performs comparably or better than existing loss functions.

02

The method improves student network performance on CIFAR-100, SVHN, and ImageNet.

03

The approach facilitates efficient deployment of deep models in resource-constrained environments.

Abstract

Model compression methods are important to allow for easier deployment of deep learning models in compute, memory and energy-constrained environments such as mobile phones. Knowledge distillation is a class of model compression algorithm where knowledge from a large teacher network is transferred to a smaller student network thereby improving the student's performance. In this paper, we show how optimal transport-based loss functions can be used for training a student network which encourages learning student network parameters that help bring the distribution of student features closer to that of the teacher features. We present image classification results on CIFAR-100, SVHN and ImageNet and show that the proposed optimal transport loss functions perform comparably to or better than other loss functions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Model Compression Using Optimal Transport· youtube

Taxonomy

MethodsKnowledge Distillation