Shared latent subspace modelling within Gaussian-Binary Restricted   Boltzmann Machines for NIST i-Vector Challenge 2014

Danila Doroshin; Alexander Yamshinin; Nikolay Lubimov; Marina; Nastasenko; Mikhail Kotov; Maxim Tkachenko

arXiv:1503.05471·cs.LG·March 19, 2015

Shared latent subspace modelling within Gaussian-Binary Restricted Boltzmann Machines for NIST i-Vector Challenge 2014

Danila Doroshin, Alexander Yamshinin, Nikolay Lubimov, Marina, Nastasenko, Mikhail Kotov, Maxim Tkachenko

PDF

Open Access

TL;DR

This paper introduces a novel speaker subspace modelling approach using Gaussian-Binary Restricted Boltzmann Machines with shared speaker factors, demonstrating improved verification techniques on the NIST i-vector dataset.

Contribution

It proposes a new GRBM-based model with shared speaker factors and introduces maximum likelihood estimation and scoring methods for speaker verification.

Findings

01

Effective speaker verification on NIST i-vector dataset

02

Shared latent subspace improves modeling accuracy

03

New scoring techniques enhance verification performance

Abstract

This paper presents a novel approach to speaker subspace modelling based on Gaussian-Binary Restricted Boltzmann Machines (GRBM). The proposed model is based on the idea of shared factors as in the Probabilistic Linear Discriminant Analysis (PLDA). GRBM hidden layer is divided into speaker and channel factors, herein the speaker factor is shared over all vectors of the speaker. Then Maximum Likelihood Parameter Estimation (MLE) for proposed model is introduced. Various new scoring techniques for speaker verification using GRBM are proposed. The results for NIST i-vector Challenge 2014 dataset are presented.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing