Investigation of Using VAE for i-Vector Speaker Verification

Timur Pekhovsky; Maxim Korenevsky

arXiv:1705.09185·cs.SD·May 26, 2017·5 cites

Investigation of Using VAE for i-Vector Speaker Verification

Timur Pekhovsky, Maxim Korenevsky

PDF

Open Access

TL;DR

This paper explores the application of variational autoencoders (VAE) for i-vector based speaker verification, demonstrating effective unsupervised training and competitive performance with traditional methods on NIST SRE 2010 data.

Contribution

It introduces a VAE-based framework for speaker recognition, including a new likelihood ratio estimation method and insights into $eta$-VAE performance for complex data modeling.

Findings

01

VAE provides effective speaker embeddings trained unsupervised.

02

VAE-based system performance is close to diagonal PLDA in i-vector space.

03

$eta$-VAE with small $eta$ captures complex data features effectively.

Abstract

New system for i-vector speaker recognition based on variational autoencoder (VAE) is investigated. VAE is a promising approach for developing accurate deep nonlinear generative models of complex data. Experiments show that VAE provides speaker embedding and can be effectively trained in an unsupervised manner. LLR estimate for VAE is developed. Experiments on NIST SRE 2010 data demonstrate its correctness. Additionally, we show that the performance of VAE-based system in the i-vectors space is close to that of the diagonal PLDA. Several interesting results are also observed in the experiments with $β$ -VAE. In particular, we found that for $β ≪ 1$ , VAE can be trained to capture the features of complex input data distributions in an effective way, which is hard to obtain in the standard VAE ( $β = 1$ ).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques

MethodsUSD Coin Customer Service Number +1-833-534-1729