Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang, Yong Jiang, Zhaohui Yan, Zixia Jia, Nguyen Bach, Tao Wang,, Zhongqiang Huang, Fei Huang, Kewei Tu

TL;DR
This paper introduces a tractable method for knowledge distillation in structured prediction tasks, enabling effective transfer of structured information between models with different output factorizations.
Contribution
It derives a factorized, computationally feasible knowledge distillation objective for structured prediction models, applicable across various factorization scenarios.
Findings
Effective distillation between sequence labeling models
Successful application to dependency parsing
Tractable optimization for complex structured outputs
Abstract
Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a more fine-grained one (the student). The objective function of knowledge distillation is typically the cross-entropy between the teacher and the student's output distributions. However, for structured prediction problems, the output space is exponential in size; therefore, the cross-entropy objective becomes intractable to compute and optimize directly. In this paper, we derive a factorized form of the knowledge distillation objective for structured prediction, which is tractable for many typical choices of the teacher and student models. In particular, we show the tractability and empirical effectiveness of structural knowledge distillation between sequence labeling and dependency parsing models under four different scenarios: 1) the teacher and student…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Machine Learning and Data Classification
MethodsKnowledge Distillation
