Efficient Sub-structured Knowledge Distillation

Wenye Lin; Yangming Li; Lemao Liu; Shuming Shi; Hai-tao Zheng

arXiv:2203.04825·cs.LG·March 10, 2022·1 cites

Efficient Sub-structured Knowledge Distillation

Wenye Lin, Yangming Li, Lemao Liu, Shuming Shi, Hai-tao Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient knowledge distillation method for structured prediction models that matches sub-structure predictions, avoiding complex decoding and enabling faster training.

Contribution

It proposes a simple, efficient approach that transfers knowledge by locally matching sub-structures, reducing training time and improving performance.

Findings

01

Outperforms previous methods on structured prediction tasks.

02

Halves the training time per epoch.

03

Encourages better internal mimicry of teacher models.

Abstract

Structured prediction models aim at solving a type of problem where the output is a complex structure, rather than a single variable. Performing knowledge distillation for such models is not trivial due to their exponentially large output space. In this work, we propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches. Specifically, we transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space. In this manner, we avoid adopting some time-consuming techniques like dynamic programming (DP) for decoding output structures, which permits parallel computation and makes the training process even faster in practice. Besides, it encourages the student model to better mimic the internal behavior of the teacher model. Experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Linwenye/Efficient-KD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Neural Networks and Applications · Anomaly Detection Techniques and Applications

MethodsKnowledge Distillation