Transducer Consistency Regularization for Speech to Text Applications

Cindy Tseng; Yun Tang; Vijendra Raj Apsingekar

arXiv:2410.07491·cs.CL·November 12, 2024

Transducer Consistency Regularization for Speech to Text Applications

Cindy Tseng, Yun Tang, Vijendra Raj Apsingekar

PDF

Open Access

TL;DR

This paper introduces Transducer Consistency Regularization (TCR), a novel method that improves speech-to-text models by encouraging consistent outputs across distorted inputs, effectively reducing word error rate on LibriSpeech.

Contribution

The paper proposes TCR, a new regularization technique for transducer models that weights alignments based on proximity to oracle alignments, enhancing model training and performance.

Findings

01

Reduces WER by 4.3% relative on LibriSpeech.

02

Outperforms other consistency regularization methods.

03

Effectively leverages data distortions and alignment weighting.

Abstract

Consistency regularization is a commonly used practice to encourage the model to generate consistent representation from distorted input features and improve model generalization. It shows significant improvement on various speech applications that are optimized with cross entropy criterion. However, it is not straightforward to apply consistency regularization for the transducer-based approaches, which are widely adopted for speech applications due to the competitive performance and streaming characteristic. The main challenge is from the vast alignment space of the transducer optimization criterion and not all the alignments within the space contribute to the model optimization equally. In this study, we present Transducer Consistency Regularization (TCR), a consistency regularization method for transducer models. We apply distortions such as spec augmentation and dropout to create…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsDropout