Exploring Efficient Directional and Distance Cues for Regional Speech Separation

Yiheng Jiang; Haoxu Wang; Yafeng Chen; Gang Qiao; Biao Tian

arXiv:2508.07563·cs.SD·August 12, 2025

Exploring Efficient Directional and Distance Cues for Regional Speech Separation

Yiheng Jiang, Haoxu Wang, Yafeng Chen, Gang Qiao, Biao Tian

PDF

Open Access

TL;DR

This paper presents a neural network approach for regional speech separation that uses novel spatial cues, including directional and distance information, to improve separation accuracy in real-world scenarios.

Contribution

The paper introduces a new neural network-based method utilizing enhanced spatial cues, including distance, for more effective regional speech separation.

Findings

01

Achieves significant improvements on multiple objective metrics.

02

Attains state-of-the-art results on the CHiME-8 MMCSG dataset.

03

Effectively discriminates sources based on direction and distance.

Abstract

In this paper, we introduce a neural network-based method for regional speech separation using a microphone array. This approach leverages novel spatial cues to extract the sound source not only from specified direction but also within defined distance. Specifically, our method employs an improved delay-and-sum technique to obtain directional cues, substantially enhancing the signal from the target direction. We further enhance separation by incorporating the direct-to-reverberant ratio into the input features, enabling the model to better discriminate sources within and beyond a specified distance. Experimental results demonstrate that our proposed method leads to substantial gains across multiple objective metrics. Furthermore, our method achieves state-of-the-art performance on the CHiME-8 MMCSG dataset, which was recorded in real-world conversational scenarios, underscoring its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Hearing Loss and Rehabilitation