Efficient Knowledge Distillation for RNN-Transducer Models

Sankaran Panchapagesan; Daniel S. Park; Chung-Cheng Chiu; Yuan; Shangguan; Qiao Liang; Alexander Gruenstein

arXiv:2011.06110·eess.AS·November 13, 2020

Efficient Knowledge Distillation for RNN-Transducer Models

Sankaran Panchapagesan, Daniel S. Park, Chung-Cheng Chiu, Yuan, Shangguan, Qiao Liang, Alexander Gruenstein

PDF

TL;DR

This paper introduces an efficient knowledge distillation method for RNN-Transducer models, improving speech recognition accuracy, especially for sparse models, with simple loss functions and broad applicability across datasets.

Contribution

The paper proposes a novel, simple distillation loss for RNN-T models that enhances accuracy of sparse models and is effective across multiple speech recognition datasets.

Findings

01

WER reductions of 4.3% and 12.1% on noisy datasets for sparse models

02

4.8% relative WER reduction on LibriSpeech test-other

03

Effective distillation for both pruning and small models

Abstract

Knowledge Distillation is an effective method of transferring knowledge from a large model to a smaller model. Distillation can be viewed as a type of model compression, and has played an important role for on-device ASR applications. In this paper, we develop a distillation method for RNN-Transducer (RNN-T) models, a popular end-to-end neural network architecture for streaming speech recognition. Our proposed distillation loss is simple and efficient, and uses only the "y" and "blank" posterior probabilities from the RNN-T output probability lattice. We study the effectiveness of the proposed approach in improving the accuracy of sparse RNN-T models obtained by gradually pruning a larger uncompressed model, which also serves as the teacher during distillation. With distillation of 60% and 90% sparse multi-domain RNN-T models, we obtain WER reductions of 4.3% and 12.1% respectively, on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.