Equivalence of Segmental and Neural Transducer Modeling: A Proof of   Concept

Wei Zhou; Albert Zeyer; Andr\'e Merboldt; Ralf Schl\"uter; Hermann Ney

arXiv:2104.06104·cs.CL·October 24, 2023

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept

Wei Zhou, Albert Zeyer, Andr\'e Merboldt, Ralf Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper proves the theoretical equivalence between RNN-Transducer and segmental models in speech recognition, showing they have the same modeling power and exploring decoding strategies through initial experiments.

Contribution

It establishes the formal equivalence between transducer and segmental models, linking their internal mechanisms and demonstrating their comparable capabilities.

Findings

01

Blank probabilities correspond to segment length probabilities.

02

Time-synchronous and label-synchronous decoding strategies have distinct properties.

03

Transducer and segmental models are theoretically equivalent in modeling power.

Abstract

With the advent of direct models in automatic speech recognition (ASR), the formerly prevalent frame-wise acoustic modeling based on hidden Markov models (HMM) diversified into a number of modeling architectures like encoder-decoder attention models, transducer models and segmental models (direct HMM). While transducer models stay with a frame-level model definition, segmental models are defined on the level of label segments directly. While (soft-)attention-based models avoid explicit alignment, transducer and segmental approach internally do model alignment, either by segment hypotheses or, more implicitly, by emitting so-called blank symbols. In this work, we prove that the widely used class of RNN-Transducer models and segmental models (direct HMM) are equivalent and therefore show equal modeling power. It is shown that blank probabilities translate into segment length probabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques