Multi-resolution location-based training for multi-channel continuous speech separation
Hassan Taherian, and DeLiang Wang

TL;DR
This paper introduces a multi-resolution location-based training method for multi-channel speech separation, improving automatic speech recognition in overlapped multi-talker scenarios by leveraging spatial information across resolutions.
Contribution
The paper extends location-based training to multi-resolution estimation, enhancing speaker separation in conversational environments with multi-channel recordings.
Findings
Multi-resolution LBT outperforms existing methods on LibriCSS corpus.
Consistent convolutional kernel assignment improves separation accuracy.
Method effectively handles reverberant and overlapped speech conditions.
Abstract
The performance of automatic speech recognition (ASR) systems severely degrades when multi-talker speech overlap occurs. In meeting environments, speech separation is typically performed to improve the robustness of ASR systems. Recently, location-based training (LBT) was proposed as a new training criterion for multi-channel talker-independent speaker separation. Assuming fixed array geometry, LBT outperforms widely-used permutation-invariant training in fully overlapped utterances and matched reverberant conditions. This paper extends LBT to conversational multi-channel speaker separation. We introduce multi-resolution LBT to estimate the complex spectrograms from low to high time and frequency resolutions. With multi-resolution LBT, convolutional kernels are assigned consistently based on speaker locations in physical space. Evaluation results show that multi-resolution LBT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
