Target Speaker Extraction for Overlapped Multi-Talker Speaker   Verification

Wei Rao; Chenglin Xu; Eng Siong Chng; Haizhou Li

arXiv:1902.02546·eess.AS·February 8, 2019·5 cites

Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification

Wei Rao, Chenglin Xu, Eng Siong Chng, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a target speaker extraction framework that improves multi-talker speaker verification by isolating the target speaker's speech from overlapped audio, significantly reducing error rates.

Contribution

It proposes a novel approach combining target speaker extraction with verification, addressing spectral overlap issues in multi-talker scenarios.

Findings

01

Achieves 65.7% relative EER reduction in overlapped speaker verification.

02

Significantly improves verification accuracy in multi-talker environments.

03

Demonstrates effectiveness of target extraction in spectral domain separation.

Abstract

The performance of speaker verification degrades significantly when the test speech is corrupted by interference speakers. Speaker diarization does well to separate speakers if the speakers are temporally overlapped. However, if multi-talkers speak at the same time, we need the technique to separate the speech in the spectral domain. This paper proposes an overlapped multi-talker speaker verification framework by using target speaker extraction methods. Specifically, given the target speaker information, the target speaker's speech is firstly extracted from the overlapped multi-talker speech by a target speaker extraction module. Then, the extracted speech is passed to the speaker verification system. Experimental results show that the proposed approach significantly improves the performance of overlapped multi-talker speaker verification and achieves 65.7% relative EER reduction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing