Towards Robust Speaker Verification with Target Speaker Enhancement

Chunlei Zhang; Meng Yu; Chao Weng; Dong Yu

arXiv:2103.08781·eess.AS·March 17, 2021·ICASSP

Towards Robust Speaker Verification with Target Speaker Enhancement

Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu

PDF

Open Access

TL;DR

This paper introduces TASE-SVNet, a neural network that combines target speaker enhancement and embedding extraction to improve robustness in speaker verification, especially in noisy and overlapped speech scenarios.

Contribution

The paper presents a novel neural model with a speaker-conditioned enhancement front-end, nontarget speaker sampling, teacher-student training for lightweight embedding, and iterative inference for noisy enrollment.

Findings

01

Significant EER reduction over baselines in overlapped speech scenarios.

02

Effective nontarget speaker suppression improves verification accuracy.

03

Iterative inference enhances robustness in noisy environments.

Abstract

This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV). Specifically, an enrollment speaker conditioned speech enhancement module is employed as the front-end for extracting target speaker from its mixture with interfering speakers and environmental noises. Compared with the conventional target speaker enhancement models, nontarget speaker/interference suppression should draw additional attention for SV. Therefore, an effective nontarget speaker sampling strategy is explored. To improve speaker embedding extraction with a light-weighted model, a teacher-student (T/S) training is proposed to distill speaker discriminative information from large models to small models. Iterative inference is investigated to address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing