Recurrent knowledge distillation

Silvia L. Pintea; Yue Liu; Jan C. van Gemert

arXiv:1805.07170·cs.CV·May 21, 2018

Recurrent knowledge distillation

Silvia L. Pintea, Yue Liu, Jan C. van Gemert

PDF

TL;DR

This paper introduces a recurrent knowledge distillation method that compresses deep networks by replacing multiple residual layers with a single recurrent layer, maintaining accuracy while reducing parameters.

Contribution

It proposes three variants of recurrent connections in the student network, enabling significant parameter reduction with minimal accuracy loss.

Findings

01

Reduced parameter count on CIFAR-10, Scenes, MiniPlaces datasets

02

Maintained accuracy with fewer parameters

03

Demonstrated effectiveness of recurrent layers in knowledge distillation

Abstract

Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.