Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification
Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu,, Helen Meng

TL;DR
This paper introduces Bayesian neural networks into x-vector speaker verification systems to enhance their ability to generalize across different domains and environmental conditions, especially under severe mismatch scenarios.
Contribution
The integration of Bayesian neural networks into x-vector systems is novel, providing improved generalization and accuracy in speaker verification, particularly with out-of-domain data.
Findings
BNNs reduce EER by up to 4.69% in out-of-domain evaluations.
BNNs improve performance by approximately 2-3% in in-domain scenarios.
Fusion of Bayesian and standard x-vector systems yields further gains.
Abstract
Speaker verification systems usually suffer from the mismatch problem between training and evaluation data, such as speaker population mismatch, the channel and environment variations. In order to address this issue, it requires the system to have good generalization ability on unseen data. In this work, we incorporate Bayesian neural networks (BNNs) into the deep neural network (DNN) x-vector speaker verification system to improve the system's generalization ability. With the weight uncertainty modeling provided by BNNs, we expect the system could generalize better on the evaluation data and make verification decisions more accurately. Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data. Specifically, results show that the system could benefit from BNNs by a relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
