Label-Looping: Highly Efficient Decoding for Transducers
Vladimir Bataev, Hainan Xu, Daniel Galvez, Vitaly Lavrukhin, Boris, Ginsburg

TL;DR
This paper presents a novel greedy decoding algorithm for Transducer speech recognition models that significantly improves speed by restructuring the decoding loops and leveraging CUDA tensor structures for parallel hypothesis management.
Contribution
The paper introduces a label-looping decoding algorithm that enhances efficiency and is compatible with various Transducer models, supported by an open-source implementation.
Findings
Up to 2.0X faster decoding with batch size 32.
Compatible with conventional and Token-and-Duration Transducers.
Further speedups achievable with additional GPU and compiler optimizations.
Abstract
This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop iterates over labels, while the inner loop iterates over frames searching for the next non-blank symbol. Additionally, we represent partial hypotheses in a special structure using CUDA tensors, supporting parallelized hypotheses manipulations. Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32. It can be further combined with other compiler or GPU call-related techniques to achieve even more speedup. Our algorithm is general-purpose and can work with both conventional Transducers and Token-and-Duration Transducers. We open-source our implementation to benefit the research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Algorithms and Data Compression · Digital Filter Design and Implementation
