Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information
Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Zhiyong, Yan, Yongqing Wang, Junbo Zhang, Yujun Wang

TL;DR
This paper introduces a monaural target speaker extraction method that utilizes distance and speaker information, achieving robust performance in noisy, reverberant environments without prior speaker enrollment.
Contribution
The paper proposes the NS-Extractor leveraging distance info and self-enrollment, with full- and sub-band modeling to improve robustness in reverberant conditions.
Findings
Effective in cross-dataset evaluations
Outperforms existing methods in noisy environments
Robust to reverberation effects
Abstract
Previously, Target Speaker Extraction (TSE) has yielded outstanding performance in certain application scenarios for speech enhancement and source separation. However, obtaining auxiliary speaker-related information is still challenging in noisy environments with significant reverberation. inspired by the recently proposed distance-based sound separation, we propose the near sound (NS) extractor, which leverages distance information for TSE to reliably extract speaker information without requiring previous speaker enrolment, called speaker embedding self-enrollment (SESE). Full- & sub-band modeling is introduced to enhance our NS-Extractor's adaptability towards environments with significant reverberation. Experimental results on several cross-datasets demonstrate the effectiveness of our improvements and the excellent performance of our proposed NS-Extractor in different application…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
