Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
Yiwen Shao, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Daniel Povey,, Sanjeev Khudanpur

TL;DR
This paper presents Solo-SF, a novel method that improves multi-channel, multi-speaker ASR by using a target speaker's isolated segment, achieving lower error rates without relying on microphone array configurations.
Contribution
Introducing Solo-SF, a new approach that leverages solo speech segments to enhance target speaker recognition in multi-channel ASR, bypassing traditional spatial information requirements.
Findings
Solo-SF outperforms existing methods in CER reduction.
Effective solo segment selection strategies are crucial for Solo-SF.
Demonstrated robustness across datasets and noise conditions.
Abstract
In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge. Traditional approaches often rely on microphone array configurations and the information of the target speaker's location or voiceprint. This study introduces the Solo Spatial Feature (Solo-SF), an innovative method that utilizes a target speaker's isolated speech segment to enhance ASR performance, thereby circumventing the need for conventional inputs like microphone array layouts. We explore effective strategies for selecting optimal solo segments, a crucial aspect for Solo-SF's success. Through evaluations conducted on the AliMeeting dataset and AISHELL-1 simulations, Solo-SF demonstrates superior performance over existing techniques, significantly lowering Character Error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
