WIND: Accelerated RNN-T Decoding with Windowed Inference for Non-blank Detection
Hainan Xu, Vladimir Bataev, Lilit Grigoryan, Boris Ginsburg

TL;DR
WIND introduces a windowed inference method for RNN-T decoding that accelerates inference speed by processing multiple frames in parallel, maintaining accuracy and improving efficiency across various decoding strategies.
Contribution
The paper presents WIND, a novel windowed inference technique for RNN-T decoding that significantly speeds up inference without accuracy loss, including new beam-search algorithms.
Findings
Up to 2.4x speed-up in greedy decoding
Maintains identical WER with baseline methods
Achieves better speed and accuracy with the new beam-search algorithm
Abstract
We propose Windowed Inference for Non-blank Detection (WIND), a novel strategy that significantly accelerates RNN-T inference without compromising model accuracy. During model inference, instead of processing frames sequentially, WIND processes multiple frames simultaneously within a window in parallel, allowing the model to quickly locate non-blank predictions during decoding, resulting in significant speed-ups. We implement WIND for greedy decoding, batched greedy decoding with label-looping techniques, and also propose a novel beam-search decoding method. Experiments on multiple datasets with different conditions show that our method, when operating in greedy modes, speeds up as much as 2.4X compared to the baseline sequential approach while maintaining identical Word Error Rate (WER) performance. Our beam-search algorithm achieves slightly better accuracy than alternative methods,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design · Integrated Circuits and Semiconductor Failure Analysis · Digital Media Forensic Detection
