Exploring RNN-Transducer for Chinese Speech Recognition

Senmao Wang; Pan Zhou; Wei Chen; Jia Jia; Lei Xie

arXiv:1811.05097·cs.CL·April 24, 2019·6 cites

Exploring RNN-Transducer for Chinese Speech Recognition

Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie

PDF

Open Access

TL;DR

This paper investigates RNN-Transducer for Chinese speech recognition, proposing training improvements like learning rate decay and convolutional layers, achieving a 16.9% CER and surpassing previous models.

Contribution

The study introduces new training strategies for RNN-T, including learning rate decay and convolutional layers, simplifying training while maintaining high performance.

Findings

01

Achieved 16.9% CER on Chinese speech recognition

02

Proposed learning rate decay to accelerate convergence

03

Added convolutional layers to eliminate pre-training

Abstract

End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system. RNN transducer (RNN-T) is one of the popular end-to-end methods. Previous studies have shown that RNN-T is difficult to train and a very complex training process is needed for a reasonable performance. In this paper, we explore RNN-T for a Chinese large vocabulary continuous speech recognition (LVCSR) task and aim to simplify the training process while maintaining performance. First, a new strategy of learning rate decay is proposed to accelerate the model convergence. Second, we find that adding convolutional layers at the beginning of the network and using ordered data can discard the pre-training process of the encoder without loss of performance. Besides, we design experiments to find a balance among the usage of GPU memory,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling