Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
Ilya Sklyar, Anna Piunova, Christian Osendorfer

TL;DR
This paper introduces a novel streaming model for multi-party speech recognition and segmentation, integrating speech separation, recognition, and segmentation to improve accuracy and latency in multi-turn conversations.
Contribution
The work presents a new separator-transducer-segmenter model with innovative segmentation strategies, regularization techniques, and latency penalties for better multi-party speech processing.
Findings
Achieved 4.6% improvement in turn counting accuracy
Reduced word error rate by 17% on LibriCSS dataset
Enhanced segmentation without degrading recognition accuracy
Abstract
Streaming recognition and segmentation of multi-party conversations with overlapping speech is crucial for the next generation of voice assistant applications. In this work we address its challenges discovered in the previous work on multi-turn recurrent neural network transducer (MT-RNN-T) with a novel approach, separator-transducer-segmenter (STS), that enables tighter integration of speech separation, recognition and segmentation in a single model. First, we propose a new segmentation modeling strategy through start-of-turn and end-of-turn tokens that improves segmentation without recognition accuracy degradation. Second, we further improve both speech recognition and segmentation accuracy through an emission regularization method, FastEmit, and multi-task training with speech activity information as an additional training signal. Third, we experiment with end-of-turn emission…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
