Sequence Transduction with Graph-based Supervision

Niko Moritz; Takaaki Hori; Shinji Watanabe; Jonathan Le Roux

arXiv:2111.01272·cs.CL·April 1, 2022

Sequence Transduction with Graph-based Supervision

Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

PDF

Open Access

TL;DR

This paper introduces a generalized transducer loss function that uses graph-based supervision, leading to improved speech recognition accuracy and more flexible alignment control compared to traditional RNN-T models.

Contribution

It proposes a new transducer objective that accepts graph representations of labels, enabling flexible manipulation of training lattices and better optimization.

Findings

01

Achieves 4.8% relative improvement on LibriSpeech test-other

02

Ensures strictly monotonic alignments for better decoding

03

Demonstrates the effectiveness of graph-based supervision in transducer models

Abstract

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, e.g., for studying different transition rules, implementing different transducer losses, or restricting alignments. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems