HMM-Free Encoder Pre-Training for Streaming RNN Transducer

Lu Huang; Jingyu Sun; Yufeng Tang; Junfeng Hou; Jinkun Chen; Jun; Zhang; Zejun Ma

arXiv:2104.10764·eess.AS·June 14, 2021

HMM-Free Encoder Pre-Training for Streaming RNN Transducer

Lu Huang, Jingyu Sun, Yufeng Tang, Junfeng Hou, Jinkun Chen, Jun, Zhang, Zejun Ma

PDF

Open Access

TL;DR

This paper introduces a novel HMM-free encoder pre-training method for streaming RNN transducers using CTC-based frame-wise labels, improving performance and latency without requiring traditional alignment tools.

Contribution

It presents the first HMM-free, CTC-based frame-wise label generation for encoder pre-training in streaming RNN-T models, enhancing training efficiency and accuracy.

Findings

01

Reduces WER by 5-11% compared to random initialization.

02

Decreases emission latency by 60 ms.

03

Works effectively on LibriSpeech and MLS English tasks.

Abstract

This work describes an encoder pre-training procedure using frame-wise label to improve the training of streaming recurrent neural network transducer (RNN-T) model. Streaming RNN-T trained from scratch usually performs worse than non-streaming RNN-T. Although it is common to address this issue through pre-training components of RNN-T with other criteria or frame-wise alignment guidance, the alignment is not easily available in end-to-end manner. In this work, frame-wise alignment, used to pre-train streaming RNN-T's encoder, is generated without using a HMM-based system. Therefore an all-neural framework equipping HMM-free encoder pre-training is constructed. This is achieved by expanding the spikes of CTC model to their left/right blank frames, and two expanding strategies are proposed. To our best knowledge, this is the first work to simulate HMM-based frame-wise label using CTC model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling