Investigation of Practical Aspects of Single Channel Speech Separation for ASR
Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki, Kanda, Shujie Liu, Jinyu Li

TL;DR
This paper enhances single channel speech separation for ASR by combining a two-stage training scheme with model compression, leading to significant WER improvements on LibriCSS with lightweight models.
Contribution
It introduces a novel two-stage training approach and a modified teacher-student technique for model compression in speech separation for ASR.
Findings
Achieved 2.70% absolute WER reduction on LibriCSS.
Developed a lightweight model with less than 10M parameters.
Demonstrated improved performance in both utterance-wise and continuous evaluation.
Abstract
Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR). However, a speech separation model often introduces target speech distortion, resulting in a sub-optimum word error rate (WER). In this paper, we describe our efforts to improve the performance of a single channel speech separation system. Specifically, we investigate a two-stage training scheme that firstly applies a feature level optimization criterion for pretraining, followed by an ASR-oriented optimization criterion using an end-to-end (E2E) speech recognition model. Meanwhile, to keep the model light-weight, we introduce a modified teacher-student learning technique for model compression. By combining those approaches,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
