SEEF-ALDR: A Speaker Embedding Enhancement Framework via Adversarial   Learning based Disentangled Representation

Jianwei Tai; Xiaoqi Jia; Qingjia Huang; Weijuan Zhang; Haichao Du,; Shengzhi Zhang

arXiv:1912.02608·eess.AS·October 27, 2020

SEEF-ALDR: A Speaker Embedding Enhancement Framework via Adversarial Learning based Disentangled Representation

Jianwei Tai, Xiaoqi Jia, Qingjia Huang, Weijuan Zhang, Haichao Du,, Shengzhi Zhang

PDF

TL;DR

SEEF-ALDR enhances speaker verification by using adversarial learning to disentangle speaker identity from irrelevant information, significantly improving accuracy across multiple baseline models without altering their structure.

Contribution

This work introduces a modular adversarial learning framework that effectively isolates speaker identity features, boosting verification performance without changing existing models.

Findings

01

Achieved over 20% average improvement on Voxceleb datasets.

02

Framework is compatible with various baseline models without structural modifications.

03

Ablation study confirms the effectiveness of each module.

Abstract

Speaker verification, as a biometric authentication mechanism, has been widely used due to the pervasiveness of voice control on smart devices. However, the task of "in-the-wild" speaker verification is still challenging, considering the speech samples may contain lots of identity-unrelated information, e.g., background noise, reverberation, emotion, etc. Previous works focus on optimizing the model to improve verification accuracy, without taking into account the elimination of the impact from the identity-unrelated information. To solve the above problem, we propose SEEF-ALDR, a novel Speaker Embedding Enhancement Framework via Adversarial Learning based Disentangled Representation, to reinforce the performance of existing models on speaker verification. The key idea is to retrieve as much speaker identity information as possible from the original speech, thus minimizing the impact of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.