A Double Joint Bayesian Approach for J-Vector Based Text-dependent Speaker Verification
Ziqiang Shi, Mengjiao Wang, Liu Liu, Huibin Lin, Rujie Liu

TL;DR
This paper introduces a Double Joint Bayesian (DoJoBa) model that explicitly captures multi-faceted information in j-vectors for text-dependent speaker verification, significantly improving accuracy on the RSR2015 dataset.
Contribution
The paper proposes a novel generative DoJoBa model that jointly models multiple heterogeneous features in j-vectors, enhancing verification performance.
Findings
Achieved 0.02% EER on RSR2015 dataset.
Effectively models multi-faceted information in j-vectors.
Outperforms previous back-end classifiers in speaker verification.
Abstract
J-vector has been proved to be very effective in text-dependent speaker verification with short-duration speech. However, the current state-of-the-art back-end classifiers, e.g. joint Bayesian model, cannot make full use of such deep features. In this paper, we generalize the standard joint Bayesian approach to model the multi-faceted information in the j-vector explicitly and jointly. In our generalization, the j-vector was modeled as a result derived by a generative Double Joint Bayesian (DoJoBa) model, which contains several kinds of latent variables. With DoJoBa, we are able to explicitly build a model that can combine multiple heterogeneous information from the j-vectors. In verification step, we calculated the likelihood to describe whether the two j-vectors having consistent labels or not. On the public RSR2015 data corpus, the experimental results showed that our approach can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
