Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori,, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura

TL;DR
This paper introduces a hierarchical knowledge distillation approach that enables small dialogue sequence labeling models to retain the complex contextual understanding of large models, improving performance on tasks like dialogue act estimation.
Contribution
It proposes a novel hierarchical knowledge distillation method that preserves contextual information from large models in smaller, deployable models for dialogue sequence labeling.
Findings
Improved accuracy in dialogue act estimation.
Effective preservation of hierarchical context knowledge.
Small models outperform baseline distillation methods.
Abstract
This paper presents a novel knowledge distillation method for dialogue sequence labeling. Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for many applications such as dialogue act estimation. Accurate labeling is often realized by a hierarchically-structured large model consisting of utterance-level and dialogue-level networks that capture the contexts within an utterance and between utterances, respectively. However, due to its large model size, such a model cannot be deployed on resource-constrained devices. To overcome this difficulty, we focus on knowledge distillation which trains a small model by distilling the knowledge of a large and high performance teacher model. Our key idea is to distill the knowledge while keeping the complex contexts captured by the teacher model. To this end,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsKnowledge Distillation
