Robust Target Speaker Direction of Arrival Estimation

Zixuan Li; Shulin He; Xueliang Zhang

arXiv:2412.18913·cs.SD·December 30, 2024

Robust Target Speaker Direction of Arrival Estimation

Zixuan Li, Shulin He, Xueliang Zhang

PDF

Open Access

TL;DR

This paper introduces RTS-DOA, a robust real-time system for estimating the direction of a target speaker in challenging multi-speaker environments, utilizing speech enhancement, spatial, and speaker modules.

Contribution

The paper presents a novel RTS-DOA system that improves target speaker DOA estimation by integrating speech enhancement, spatial learning, and voiceprint features in real-time.

Findings

01

Outperforms existing methods in multi-speaker scenarios

02

Achieves new benchmarks on LibriSpeech dataset

03

Effectively handles noise and reverberation

Abstract

In multi-speaker environments the direction of arrival (DOA) of a target speaker is key for improving speech clarity and extracting target speaker's voice. However, traditional DOA estimation methods often struggle in the presence of noise, reverberation, and particularly when competing speakers are present. To address these challenges, we propose RTS-DOA, a robust real-time DOA estimation system. This system innovatively uses the registered speech of the target speaker as a reference and leverages full-band and sub-band spectral information from a microphone array to estimate the DOA of the target speaker's voice. Specifically, the system comprises a speech enhancement module for initially improving speech quality, a spatial module for learning spatial information, and a speaker module for extracting voiceprint features. Experimental results on the LibriSpeech dataset demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Direction-of-Arrival Estimation Techniques · Radar Systems and Signal Processing