# A Token-Wise Beam Search Algorithm for RNN-T

**Authors:** Gil Keren

arXiv: 2302.14357 · 2023-10-09

## TL;DR

This paper introduces a token-wise beam search algorithm for RNN-T that batches joint network calls across segments, significantly improving decoding speed and accuracy in speech recognition.

## Contribution

The proposed algorithm batches joint network calls across segments, leading to substantial speedups and improved word error rates compared to traditional time-step decoding methods.

## Key findings

- Achieves 20%-96% decoding speedup across models
- Improves oracle WER by up to 11% relative with larger segments
- Slightly enhances general WER

## Abstract

Standard Recurrent Neural Network Transducers (RNN-T) decoding algorithms for speech recognition are iterating over the time axis, such that one time step is decoded before moving on to the next time step. Those algorithms result in a large number of calls to the joint network, which were shown in previous work to be an important factor that reduces decoding speed. We present a decoding beam search algorithm that batches the joint network calls across a segment of time steps, which results in 20%-96% decoding speedups consistently across all models and settings experimented with. In addition, aggregating emission probabilities over a segment may be seen as a better approximation to finding the most likely model output, causing our algorithm to improve oracle word error rate by up to 11% relative as the segment size increases, and to slightly improve general word error rate.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14357/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/2302.14357/full.md

---
Source: https://tomesphere.com/paper/2302.14357