Robust Target Speaker Direction of Arrival Estimation
Zixuan Li, Shulin He, Xueliang Zhang

TL;DR
This paper introduces RTS-DOA, a robust real-time system for estimating the direction of a target speaker in challenging multi-speaker environments, utilizing speech enhancement, spatial, and speaker modules.
Contribution
The paper presents a novel RTS-DOA system that improves target speaker DOA estimation by integrating speech enhancement, spatial learning, and voiceprint features in real-time.
Findings
Outperforms existing methods in multi-speaker scenarios
Achieves new benchmarks on LibriSpeech dataset
Effectively handles noise and reverberation
Abstract
In multi-speaker environments the direction of arrival (DOA) of a target speaker is key for improving speech clarity and extracting target speaker's voice. However, traditional DOA estimation methods often struggle in the presence of noise, reverberation, and particularly when competing speakers are present. To address these challenges, we propose RTS-DOA, a robust real-time DOA estimation system. This system innovatively uses the registered speech of the target speaker as a reference and leverages full-band and sub-band spectral information from a microphone array to estimate the DOA of the target speaker's voice. Specifically, the system comprises a speech enhancement module for initially improving speech quality, a spatial module for learning spatial information, and a speaker module for extracting voiceprint features. Experimental results on the LibriSpeech dataset demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Direction-of-Arrival Estimation Techniques · Radar Systems and Signal Processing
