Powerful and Extensible WFST Framework for RNN-Transducer Losses
Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg

TL;DR
This paper introduces a flexible WFST-based framework for RNN-Transducer losses, enabling easier development, debugging, and extension of RNN-T models with new loss functions, demonstrated by a novel W-Transducer loss.
Contribution
It presents two new WFST-powered RNN-T implementations that are more extendable and efficient, along with a new W-Transducer loss for weakly-supervised learning.
Findings
W-Transducer outperforms standard RNN-T in weakly-supervised scenarios.
WFST-based implementations are easier to extend and debug.
The framework is integrated into the NeMo toolkit.
Abstract
This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug. WFSTs are easy to construct and extend, and allow debugging through visualization. We introduce two WFST-powered RNN-T implementations: (1) "Compose-Transducer", based on a composition of the WFST graphs from acoustic and textual schema -- computationally competitive and easy to modify; (2) "Grid-Transducer", which constructs the lattice directly for further computations -- most compact, and computationally efficient. We illustrate the ease of extensibility through introduction of a new W-Transducer loss -- the adaptation of the Connectionist Temporal Classification with Wild Cards. W-Transducer (W-RNNT) consistently outperforms the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
Methodsweighted finite state transducer
