Exploring Efficient Directional and Distance Cues for Regional Speech Separation
Yiheng Jiang, Haoxu Wang, Yafeng Chen, Gang Qiao, Biao Tian

TL;DR
This paper presents a neural network approach for regional speech separation that uses novel spatial cues, including directional and distance information, to improve separation accuracy in real-world scenarios.
Contribution
The paper introduces a new neural network-based method utilizing enhanced spatial cues, including distance, for more effective regional speech separation.
Findings
Achieves significant improvements on multiple objective metrics.
Attains state-of-the-art results on the CHiME-8 MMCSG dataset.
Effectively discriminates sources based on direction and distance.
Abstract
In this paper, we introduce a neural network-based method for regional speech separation using a microphone array. This approach leverages novel spatial cues to extract the sound source not only from specified direction but also within defined distance. Specifically, our method employs an improved delay-and-sum technique to obtain directional cues, substantially enhancing the signal from the target direction. We further enhance separation by incorporating the direct-to-reverberant ratio into the input features, enabling the model to better discriminate sources within and beyond a specified distance. Experimental results demonstrate that our proposed method leads to substantial gains across multiple objective metrics. Furthermore, our method achieves state-of-the-art performance on the CHiME-8 MMCSG dataset, which was recorded in real-world conversational scenarios, underscoring its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Hearing Loss and Rehabilitation
