Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
Lilit Grigoryan, Vladimir Bataev, Andrei Andrusenko, Hainan Xu, Vitaly Lavrukhin, Boris Ginsburg

TL;DR
This paper presents a universal acceleration method for beam search in Transducer-based ASR models, significantly improving decoding speed and accuracy while maintaining practical inference performance.
Contribution
It introduces a novel approach combining batch operations, tree-based hypothesis structures, and CUDA optimizations to speed up beam search for Transducers.
Findings
Speed gap between beam and greedy decoding reduced to 10-20%.
Achieved 14-30% relative WER improvement over greedy decoding.
Enhanced shallow fusion performance by up to 11% in low-resource scenarios.
Abstract
Transducer models have emerged as a promising choice for end-to-end ASR systems, offering a balanced trade-off between recognition accuracy, streaming capabilities, and inference speed in greedy decoding. However, beam search significantly slows down Transducers due to repeated evaluations of key network components, limiting practical applications. This paper introduces a universal method to accelerate beam search for Transducers, enabling the implementation of two optimized algorithms: ALSD++ and AES++. The proposed method utilizes batch operations, a tree-based hypothesis structure, novel blank scoring for enhanced shallow fusion, and CUDA graph execution for efficient GPU inference. This narrows the speed gap between beam and greedy modes to only 10-20% for the whole system, achieves 14-30% relative improvement in WER compared to greedy decoding, and improves shallow fusion for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Ultrasonics and Acoustic Wave Propagation
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
