Coupling a generative model with a discriminative learning framework for   speaker verification

Xugang Lu; Peng Shen; Yu Tsao; Hisashi Kawai

arXiv:2101.03329·eess.AS·November 25, 2021

Coupling a generative model with a discriminative learning framework for speaker verification

Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

PDF

Open Access

TL;DR

This paper introduces a hybrid framework combining generative Bayesian models with discriminative neural networks for speaker verification, improving accuracy by leveraging the strengths of both approaches.

Contribution

It proposes a novel coupling of a joint Bayesian generative model with a neural discriminative framework, enhancing speaker verification performance.

Findings

01

Significant performance improvement over state-of-the-art models

02

Effective integration of generative and discriminative learning

03

Robustness demonstrated on Speakers in the Wild and VoxCeleb datasets

Abstract

The speaker verification (SV) task is to decide whether an utterance is spoken by a target or an imposter speaker. For most studies, a log-likelihood ratio (LLR) score is estimated based on a generative probability model on speaker features and compared with a threshold for making a decision. However, the generative model usually focuses on individual feature distributions, does not have the discriminative feature selection ability, and is easy to be distracted by nuisance features. The SV could be formulated as a binary discrimination task where neural network-based discriminative learning could be applied. In discriminative learning, the nuisance features could be removed with the help of label supervision. However, discriminative learning pays more attention to classification boundaries and is prone to overfitting to a training set which may result in bad generalization on a test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsFeature Selection