Knowledge Distillation via Route Constrained Optimization

Xiao Jin; Baoyun Peng; Yichao Wu; Yu Liu; Jiaheng Liu; Ding Liang,; Junjie Yan; Xiaolin Hu

arXiv:1904.09149·cs.LG·April 22, 2019·36 cites

Knowledge Distillation via Route Constrained Optimization

Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang,, Junjie Yan, Xiaolin Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces route constrained optimization (RCO), a novel knowledge distillation method inspired by curriculum learning, which improves the training of small neural networks by using selected route points from the teacher model's parameter space.

Contribution

The paper proposes RCO, a new approach that reduces the congruence loss in knowledge distillation by routing through parameter space, enhancing performance on classification and face recognition tasks.

Findings

01

RCO improves accuracy on CIFAR100 by 2.14%.

02

RCO enhances ImageNet performance by 1.5%.

03

RCO demonstrates better generalization on MegaFace face recognition.

Abstract

Distillation-based learning boosts the performance of the miniaturized neural network based on the hypothesis that the representation of a teacher model can be used as structured and relatively weak supervision, and thus would be easily learned by a miniaturized model. However, we find that the representation of a converged heavy model is still a strong constraint for training a small student model, which leads to a high lower bound of congruence loss. In this work, inspired by curriculum learning we consider the knowledge distillation from the perspective of curriculum learning by routing. Instead of supervising the student model with a converged teacher model, we supervised it with some anchor points selected from the route in parameter space that the teacher model passed by, as we called route constrained optimization (RCO). We experimentally demonstrate this simple operation greatly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SforAiDl/KD_Lib
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM

MethodsKnowledge Distillation