Disentangled speaker and nuisance attribute embedding for robust speaker   verification

Woo Hyun Kang; Sung Hwan Mun; Min Hyun Han; Nam Soo Kim

arXiv:2008.03024·eess.AS·August 10, 2020

Disentangled speaker and nuisance attribute embedding for robust speaker verification

Woo Hyun Kang, Sung Hwan Mun, Min Hyun Han, Nam Soo Kim

PDF

TL;DR

This paper introduces a supervised learning approach to generate speaker embeddings that are disentangled from nuisance attributes like channel and emotion, improving robustness in speaker verification across varied conditions.

Contribution

A novel fully supervised training method for disentangling speaker and nuisance attributes in embeddings, enhancing robustness in diverse speech conditions.

Findings

01

Robust speaker embeddings against channel variability

02

Effective disentanglement of speaker and nuisance attributes

03

Improved verification accuracy on RSR2015 and VoxCeleb1 datasets

Abstract

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.