Efficient Sub-structured Knowledge Distillation
Wenye Lin, Yangming Li, Lemao Liu, Shuming Shi, Hai-tao Zheng

TL;DR
This paper introduces an efficient knowledge distillation method for structured prediction models that matches sub-structure predictions, avoiding complex decoding and enabling faster training.
Contribution
It proposes a simple, efficient approach that transfers knowledge by locally matching sub-structures, reducing training time and improving performance.
Findings
Outperforms previous methods on structured prediction tasks.
Halves the training time per epoch.
Encourages better internal mimicry of teacher models.
Abstract
Structured prediction models aim at solving a type of problem where the output is a complex structure, rather than a single variable. Performing knowledge distillation for such models is not trivial due to their exponentially large output space. In this work, we propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches. Specifically, we transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space. In this manner, we avoid adopting some time-consuming techniques like dynamic programming (DP) for decoding output structures, which permits parallel computation and makes the training process even faster in practice. Besides, it encourages the student model to better mimic the internal behavior of the teacher model. Experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Anomaly Detection Techniques and Applications
MethodsKnowledge Distillation
