Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification
Kyuyeon Hwang, Wonyong Sung

TL;DR
This paper introduces an online CTC training algorithm for RNNs that allows learning from longer sequences without extensive unrolling, enabling more efficient and scalable sequence modeling for speech recognition tasks.
Contribution
The authors propose an EM-based online CTC algorithm that supports training RNNs on sequences longer than the unrolling window, facilitating online learning and efficient GPU parallelization.
Findings
Achieved 20.7% PER on concatenated TIMIT sequences.
Trained WSJ speech model with only 64 unrollings, reducing memory usage.
Maintained competitive WER with significantly less unrolling.
Abstract
Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of training sequences is usually not uniform, which makes parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs). In this work, we introduce an expectation-maximization (EM) based online CTC algorithm that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling. The RNNs can also be trained to process an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Neural Networks and Applications
