Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng, Guo

TL;DR
This paper systematically compares sentence-level and token-level knowledge distillation in neural machine translation, proposing a hybrid method that combines both to improve performance across different scenario complexities.
Contribution
It provides a comprehensive analysis of when to use sentence-level versus token-level distillation and introduces a novel hybrid approach that outperforms existing methods.
Findings
Token-level distillation is better for simple scenarios.
Sentence-level distillation excels in complex scenarios.
The hybrid method outperforms individual distillation techniques.
Abstract
Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation encompasses two primary methods: sentence-level distillation and token-level distillation. In sentence-level distillation, the student model is trained to align with the output of the teacher model, which can alleviate the training difficulty and give student model a comprehensive understanding of global structure. Differently, token-level distillation requires the student model to learn the output distribution of the teacher model, facilitating a more fine-grained transfer of knowledge. Studies have revealed divergent performances between sentence-level and token-level distillation across different scenarios, leading to the confusion on the empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsKnowledge Management and Sharing
MethodsKnowledge Distillation · ALIGN
