Investigation of Using VAE for i-Vector Speaker Verification
Timur Pekhovsky, Maxim Korenevsky

TL;DR
This paper explores the application of variational autoencoders (VAE) for i-vector based speaker verification, demonstrating effective unsupervised training and competitive performance with traditional methods on NIST SRE 2010 data.
Contribution
It introduces a VAE-based framework for speaker recognition, including a new likelihood ratio estimation method and insights into $eta$-VAE performance for complex data modeling.
Findings
VAE provides effective speaker embeddings trained unsupervised.
VAE-based system performance is close to diagonal PLDA in i-vector space.
$eta$-VAE with small $eta$ captures complex data features effectively.
Abstract
New system for i-vector speaker recognition based on variational autoencoder (VAE) is investigated. VAE is a promising approach for developing accurate deep nonlinear generative models of complex data. Experiments show that VAE provides speaker embedding and can be effectively trained in an unsupervised manner. LLR estimate for VAE is developed. Experiments on NIST SRE 2010 data demonstrate its correctness. Additionally, we show that the performance of VAE-based system in the i-vectors space is close to that of the diagonal PLDA. Several interesting results are also observed in the experiments with -VAE. In particular, we found that for , VAE can be trained to capture the features of complex input data distributions in an effective way, which is hard to obtain in the standard VAE ().
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
MethodsUSD Coin Customer Service Number +1-833-534-1729
