Bayesian x-vector: Bayesian Neural Network based x-vector System for   Speaker Verification

Xu Li; Jinghua Zhong; Jianwei Yu; Shoukang Hu; Xixin Wu; Xunying Liu,; Helen Meng

arXiv:2004.04014·eess.AS·April 9, 2020·5 cites

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu,, Helen Meng

PDF

Open Access

TL;DR

This paper introduces Bayesian neural networks into x-vector speaker verification systems to enhance their ability to generalize across different domains and environmental conditions, especially under severe mismatch scenarios.

Contribution

The integration of Bayesian neural networks into x-vector systems is novel, providing improved generalization and accuracy in speaker verification, particularly with out-of-domain data.

Findings

01

BNNs reduce EER by up to 4.69% in out-of-domain evaluations.

02

BNNs improve performance by approximately 2-3% in in-domain scenarios.

03

Fusion of Bayesian and standard x-vector systems yields further gains.

Abstract

Speaker verification systems usually suffer from the mismatch problem between training and evaluation data, such as speaker population mismatch, the channel and environment variations. In order to address this issue, it requires the system to have good generalization ability on unseen data. In this work, we incorporate Bayesian neural networks (BNNs) into the deep neural network (DNN) x-vector speaker verification system to improve the system's generalization ability. With the weight uncertainty modeling provided by BNNs, we expect the system could generalize better on the evaluation data and make verification decisions more accurately. Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data. Specifically, results show that the system could benefit from BNNs by a relative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques