Better Intermediates Improve CTC Inference
Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji, Watanabe, Yusuke Kida

TL;DR
This paper introduces new conditioning methods for CTC inference that leverage searched intermediates and multi-pass strategies, leading to significant performance improvements on speech recognition benchmarks.
Contribution
It formulates self-conditioned CTC as a probabilistic model and proposes two novel conditioning techniques that enhance inference accuracy.
Findings
Achieved 3% relative improvement on LibriSpeech test clean set.
Achieved 12% relative improvement on LibriSpeech test other set.
Demonstrated effectiveness of searched intermediates and multi-pass conditioning.
Abstract
This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate conditioning that refines intermediate predictions with beam-search, (2) Multi-pass conditioning that uses predictions of previous inference for conditioning the next inference. These new approaches enable better conditioning than the original self-conditioned CTC during inference and improve the final performance. Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
