Essence Knowledge Distillation for Speech Recognition

Zhenchuan Yang; Chun Zhang; Weibin Zhang; Jianxiu Jin; Dongpeng Chen

arXiv:1906.10834·cs.CL·June 27, 2019·6 cites

Essence Knowledge Distillation for Speech Recognition

Zhenchuan Yang, Chun Zhang, Weibin Zhang, Jianxiu Jin, Dongpeng Chen

PDF

Open Access

TL;DR

This paper introduces a novel knowledge distillation method for speech recognition that selectively uses ensemble outputs and combines them with hard labels, resulting in a more efficient and accurate single model.

Contribution

It proposes a selective distillation approach that filters ensemble outputs and employs multitask learning, improving speech recognition accuracy over traditional methods.

Findings

01

The method outperforms single models trained only on hard labels.

02

The student model surpasses the teacher model in accuracy.

03

Selective distillation reduces computational costs while maintaining high performance.

Abstract

It is well known that a speech recognition system that combines multiple acoustic models trained on the same data significantly outperforms a single-model system. Unfortunately, real time speech recognition using a whole ensemble of models is too computationally expensive. In this paper, we propose to distill the knowledge of essence in an ensemble of models (i.e. the teacher model) to a single model (i.e. the student model) that needs much less computation to deploy. Previously, all the soften outputs of the teacher model are used to optimize the student model. We argue that not all the outputs of the ensemble are necessary to be distilled. Some of the outputs may even contain noisy information that is useless or even harmful to the training of the student model. In addition, we propose to train the student model with a multitask learning approach by utilizing both the soften outputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing