FSR: Accelerating the Inference Process of Transducer-Based Models by   Applying Fast-Skip Regularization

Zhengkun Tian; Jiangyan Yi; Ye Bai; Jianhua Tao; Shuai Zhang; Zhengqi; Wen

arXiv:2104.02882·eess.AS·April 8, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Zhengkun Tian, Jiangyan Yi, Ye Bai, Jianhua Tao, Shuai Zhang, Zhengqi, Wen

PDF

Open Access

TL;DR

This paper introduces fast-skip regularization for transducer models in speech recognition, enabling faster inference by predicting and skipping blank tokens, achieving nearly fourfold speedup with minimal accuracy loss.

Contribution

It proposes a novel regularization method that aligns transducer blank predictions with CTC outputs, significantly accelerating inference in speech recognition models.

Findings

01

Inference speed increased nearly 4 times.

02

Minimal performance degradation observed.

03

Effective blank token prediction and skipping achieved.

Abstract

Transducer-based models, such as RNN-Transducer and transformer-transducer, have achieved great success in speech recognition. A typical transducer model decodes the output sequence conditioned on the current acoustic state and previously predicted tokens step by step. Statistically, The number of blank tokens in the prediction results accounts for nearly 90\% of all tokens. It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence. Therefore, we propose a method named fast-skip regularization, which tries to align the blank position predicted by a transducer with that predicted by a CTC model. During the inference, the transducer model can predict the blank tokens in advance by a simple CTC project layer without many complicated forward calculations of the transducer decoder and then skip them, which will…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing