E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model
W. Ronny Huang, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, David, Rybach, Robert David, Rohit Prabhavalkar, Cyril Allauzen, Cal Peyser, Trevor, D. Strohman

TL;DR
This paper presents a unified neural segmentation approach integrated with a two-pass cascaded encoder ASR model, achieving low latency and high-quality results for long-form captioning.
Contribution
It introduces a novel dummy frame injection strategy enabling real-time segmentation and 2nd pass finalization without latency or errors.
Findings
Achieved 2.4% relative WER reduction on YouTube captioning.
Reduced EOS latency by 140 ms compared to baseline.
Demonstrated effective real-time segmentation in a cascaded ASR system.
Abstract
We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. The EOS signal is then used to finalize the non-causal 2nd pass. We experiment with different ways to finalize the 2nd pass, and find that a novel dummy frame injection strategy allows for simultaneous high quality 2nd pass results and low finalization latency. On a real-world long-form captioning task (YouTube), we achieve 2.4% relative WER and 140 ms EOS latency gains over a baseline VAD-based segmenter with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis · Machine Learning and Algorithms
