Reducing the gap between streaming and non-streaming Transducer-based   ASR by adaptive two-stage knowledge distillation

Haitao Tang; Yu Fu; Lei Sun; Jiabin Xue; Dan Liu; Yongchao Li,; Zhiqiang Ma; Minghui Wu; Jia Pan; Genshun Wan; and Ming'en Zhao

arXiv:2306.15171·cs.CL·June 28, 2023

Reducing the gap between streaming and non-streaming Transducer-based ASR by adaptive two-stage knowledge distillation

Haitao Tang, Yu Fu, Lei Sun, Jiabin Xue, Dan Liu, Yongchao Li,, Zhiqiang Ma, Minghui Wu, Jia Pan, Genshun Wan, and Ming'en Zhao

PDF

TL;DR

This paper introduces an adaptive two-stage knowledge distillation approach to narrow the performance gap between streaming and non-streaming transducer-based ASR models, achieving significant WER reduction and faster response times.

Contribution

It proposes a novel two-stage distillation method with adaptive smoothing to improve streaming ASR accuracy by aligning hidden and output distributions.

Findings

01

19% relative WER reduction on LibriSpeech

02

Faster first token response compared to original streaming model

03

Effective alignment of hidden and output distributions

Abstract

Transducer is one of the mainstream frameworks for streaming speech recognition. There is a performance gap between the streaming and non-streaming transducer models due to limited context. To reduce this gap, an effective way is to ensure that their hidden and output distributions are consistent, which can be achieved by hierarchical knowledge distillation. However, it is difficult to ensure the distribution consistency simultaneously because the learning of the output distribution depends on the hidden one. In this paper, we propose an adaptive two-stage knowledge distillation method consisting of hidden layer learning and output layer learning. In the former stage, we learn hidden representation with full context by applying mean square error loss function. In the latter stage, we design a power transformation based adaptive smoothness method to learn stable output distribution. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation