Pruned RNN-T for fast, memory-efficient ASR training

Fangjun Kuang; Liyong Guo; Wei Kang; Long Lin; Mingshuang Luo; Zengwei; Yao; Daniel Povey

arXiv:2206.13236·eess.AS·June 28, 2022

Pruned RNN-T for fast, memory-efficient ASR training

Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei, Yao, Daniel Povey

PDF

Open Access 1 Models

TL;DR

This paper proposes a pruning method for RNN-T loss computation that significantly improves speed and reduces memory usage, enabling more practical training for large-vocabulary speech recognition systems.

Contribution

It introduces a pruning approach that efficiently bounds RNN-T recursion, facilitating faster and more memory-efficient training, especially for large vocabularies.

Findings

01

Achieves faster RNN-T loss computation

02

Reduces GPU memory usage during training

03

Enables training with larger vocabularies

Abstract

The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition. One of the drawbacks of RNN-T is that its loss function is relatively slow to compute, and can use a lot of memory. Excessive GPU memory usage can make it impractical to use RNN-T loss in cases where the vocabulary size is large: for example, for Chinese character-based ASR. We introduce a method for faster and more memory-efficient RNN-T loss computation. We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is linear in the encoder and decoder embeddings; we can evaluate this without using much memory. We then use those pruning bounds to evaluate the full, non-linear joiner network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
marcoyang/icefall-asr-librispeech-finetune-hubert-transducer-2022-12-26
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsPruning