Better Intermediates Improve CTC Inference

Tatsuya Komatsu; Yusuke Fujita; Jaesong Lee; Lukas Lee; Shinji; Watanabe; Yusuke Kida

arXiv:2204.00176·cs.CL·April 4, 2022

Better Intermediates Improve CTC Inference

Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji, Watanabe, Yusuke Kida

PDF

Open Access

TL;DR

This paper introduces new conditioning methods for CTC inference that leverage searched intermediates and multi-pass strategies, leading to significant performance improvements on speech recognition benchmarks.

Contribution

It formulates self-conditioned CTC as a probabilistic model and proposes two novel conditioning techniques that enhance inference accuracy.

Findings

01

Achieved 3% relative improvement on LibriSpeech test clean set.

02

Achieved 12% relative improvement on LibriSpeech test other set.

03

Demonstrated effectiveness of searched intermediates and multi-pass conditioning.

Abstract

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate conditioning that refines intermediate predictions with beam-search, (2) Multi-pass conditioning that uses predictions of previous inference for conditioning the next inference. These new approaches enable better conditioning than the original self-conditioned CTC during inference and improve the final performance. Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling