Knowledge distillation from language model to acoustic model: a   hierarchical multi-task learning approach

Mun-Hak Lee; Joon-Hyuk Chang

arXiv:2110.10429·cs.LG·October 22, 2021

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

Mun-Hak Lee, Joon-Hyuk Chang

PDF

Open Access

TL;DR

This paper introduces a hierarchical multi-task learning approach for cross-modal knowledge distillation from language models to acoustic models, improving speech recognition performance by leveraging different units and auxiliary outputs.

Contribution

It proposes a novel hierarchical distillation framework with auxiliary layers, enhancing knowledge transfer between language and acoustic models beyond existing label-interpolation methods.

Findings

01

Hierarchical distillation improves speech recognition accuracy.

02

Auxiliary output layers enhance knowledge transfer.

03

Method outperforms traditional label-interpolation distillation.

Abstract

The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsKnowledge Distillation