A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification
Xujiang Xing, Mingxing Xu, Thomas Fang Zheng

TL;DR
This paper introduces a novel adversarial learning framework with noise-disentanglement for robust speaker verification, significantly improving performance in noisy environments by creating noise-independent speaker embeddings.
Contribution
It proposes a joint noise-disentanglement and adversarial training framework that enhances speaker verification robustness in noisy conditions, a novel approach in the field.
Findings
Improved speaker verification accuracy in noisy conditions.
Effective separation of speaker and noise information in embeddings.
Robustness demonstrated on VoxCeleb1 dataset.
Abstract
Automatic Speaker Verification (ASV) suffers from performance degradation in noisy conditions. To address this issue, we propose a novel adversarial learning framework that incorporates noise-disentanglement to establish a noise-independent speaker invariant embedding space. Specifically, the disentanglement module includes two encoders for separating speaker related and irrelevant information, respectively. The reconstruction module serves as a regularization term to constrain the noise. A feature-robust loss is also used to supervise the speaker encoder to learn noise-independent speaker embeddings without losing speaker information. In addition, adversarial training is introduced to discourage the speaker encoder from encoding acoustic condition information for achieving a speaker-invariant embedding space. Experiments on VoxCeleb1 indicate that the proposed method improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
