Powerful and Extensible WFST Framework for RNN-Transducer Losses

Aleksandr Laptev; Vladimir Bataev; Igor Gitman; Boris Ginsburg

arXiv:2303.10384·eess.AS·May 10, 2023·1 cites

Powerful and Extensible WFST Framework for RNN-Transducer Losses

Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flexible WFST-based framework for RNN-Transducer losses, enabling easier development, debugging, and extension of RNN-T models with new loss functions, demonstrated by a novel W-Transducer loss.

Contribution

It presents two new WFST-powered RNN-T implementations that are more extendable and efficient, along with a new W-Transducer loss for weakly-supervised learning.

Findings

01

W-Transducer outperforms standard RNN-T in weakly-supervised scenarios.

02

WFST-based implementations are easier to extend and debug.

03

The framework is integrated into the NeMo toolkit.

Abstract

This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss. Existing implementations of RNN-T use CUDA-related code, which is hard to extend and debug. WFSTs are easy to construct and extend, and allow debugging through visualization. We introduce two WFST-powered RNN-T implementations: (1) "Compose-Transducer", based on a composition of the WFST graphs from acoustic and textual schema -- computationally competitive and easy to modify; (2) "Grid-Transducer", which constructs the lattice directly for further computations -- most compact, and computationally efficient. We illustrate the ease of extensibility through introduction of a new W-Transducer loss -- the adaptation of the Connectionist Temporal Classification with Wild Cards. W-Transducer (W-RNNT) consistently outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NVIDIA/NeMo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

Methodsweighted finite state transducer