TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech   Recognition

Ji Won Yoon; Hyeonseung Lee; Hyung Yong Kim; Won Ik Cho; and Nam Soo; Kim

arXiv:2008.00671·eess.AS·September 20, 2021·IEEE ACM Trans. Audio Speech Lang. Process.

TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition

Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, Won Ik Cho, and Nam Soo, Kim

PDF

TL;DR

TutorNet introduces a flexible knowledge distillation approach for end-to-end speech recognition, enabling knowledge transfer across different network architectures and improving student model performance beyond the teacher in some cases.

Contribution

It proposes TutorNet, a novel KD method that transfers knowledge across different neural network types at multiple levels, enhancing flexibility and performance in speech recognition models.

Findings

01

Significantly improves word error rate (WER) on LibriSpeech dataset.

02

Allows student models to outperform their teachers in certain scenarios.

03

Enables knowledge transfer between networks with different topologies.

Abstract

In recent years, there has been a great deal of research in developing end-to-end speech recognition models, which enable simplifying the traditional pipeline and achieving promising results. Despite their remarkable performance improvements, end-to-end models typically require expensive computational cost to show successful performance. To reduce this computational burden, knowledge distillation (KD), which is a popular model compression method, has been used to transfer knowledge from a deep and complex model (teacher) to a shallower and simpler model (student). Previous KD approaches have commonly designed the architecture of the student model by reducing the width per layer or the number of layers of the teacher model. This structural reduction scheme might limit the flexibility of model selection since the student model structure should be similar to that of the given teacher. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation