Single-Channel Target Speech Extraction Utilizing Distance and Room Clues

Runwu Shi; Zirui Lin; Benjamin Yen; Jiang Wang; Ragib Amin Nihal; Kazuhiro Nakadai

arXiv:2505.14433·eess.AS·May 21, 2025

Single-Channel Target Speech Extraction Utilizing Distance and Room Clues

Runwu Shi, Zirui Lin, Benjamin Yen, Jiang Wang, Ragib Amin Nihal, Kazuhiro Nakadai

PDF

Open Access

TL;DR

This paper introduces a novel single-channel target speech extraction method that incorporates distance and room environment clues, improving generalization across different acoustic settings.

Contribution

It proposes a distance and room environment embedding approach for TSE, enhancing robustness to room variations and demonstrating effectiveness on simulated and real data.

Findings

01

Effective in both simulated and real environments

02

Improves generalization across different room acoustics

03

Utilizes learnable embeddings for distance and room features

Abstract

This paper aims to achieve single-channel target speech extraction (TSE) in enclosures utilizing distance clues and room information. Recent works have verified the feasibility of distance clues for the TSE task, which can imply the sound source's direct-to-reverberation ratio (DRR) and thus can be utilized for speech separation and TSE systems. However, such distance clue is significantly influenced by the room's acoustic characteristics, such as dimension and reverberation time, making it challenging for TSE systems that rely solely on distance clues to generalize across a variety of different rooms. To solve this, we suggest providing room environmental information (room dimensions and reverberation time) for distance-based TSE for better generalization capabilities. Especially, we propose a distance and environment-based TSE model in the time-frequency (TF) domain with learnable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing