Coupling a generative model with a discriminative learning framework for speaker verification
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

TL;DR
This paper introduces a hybrid framework combining generative Bayesian models with discriminative neural networks for speaker verification, improving accuracy by leveraging the strengths of both approaches.
Contribution
It proposes a novel coupling of a joint Bayesian generative model with a neural discriminative framework, enhancing speaker verification performance.
Findings
Significant performance improvement over state-of-the-art models
Effective integration of generative and discriminative learning
Robustness demonstrated on Speakers in the Wild and VoxCeleb datasets
Abstract
The speaker verification (SV) task is to decide whether an utterance is spoken by a target or an imposter speaker. For most studies, a log-likelihood ratio (LLR) score is estimated based on a generative probability model on speaker features and compared with a threshold for making a decision. However, the generative model usually focuses on individual feature distributions, does not have the discriminative feature selection ability, and is easy to be distracted by nuisance features. The SV could be formulated as a binary discrimination task where neural network-based discriminative learning could be applied. In discriminative learning, the nuisance features could be removed with the help of label supervision. However, discriminative learning pays more attention to classification boundaries and is prone to overfitting to a training set which may result in bad generalization on a test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsFeature Selection
