VB calibration to improve the interface between phone recognizer and   i-vector extractor

Niko Br\"ummer

arXiv:1510.03203·stat.ML·October 15, 2015·2 cites

VB calibration to improve the interface between phone recognizer and i-vector extractor

Niko Br\"ummer

PDF

Open Access

TL;DR

This paper reinterprets the classical i-vector extractor as a mean-field variational Bayes method, unifies it with phonetic extractors, and proposes calibration techniques to enhance accuracy.

Contribution

It provides a VB-based theoretical framework for i-vector extractors and introduces posterior calibration methods to improve their performance.

Findings

01

VB interpretation unifies classical and phonetic i-vector extractors

02

Posterior calibration improves extractor accuracy

03

Proposed modifications lead to better model fit

Abstract

The EM training algorithm of the classical i-vector extractor is often incorrectly described as a maximum-likelihood method. The i-vector model is however intractable: the likelihood itself and the hidden-variable posteriors needed for the EM algorithm cannot be computed in closed form. We show here that the classical i-vector extractor recipe is actually a mean-field variational Bayes (VB) recipe. This theoretical VB interpretation turns out to be of further use, because it also offers an interpretation of the newer phonetic i-vector extractor recipe, thereby unifying the two flavours of extractor. More importantly, the VB interpretation is also practically useful: it suggests ways of modifying existing i-vector extractors to make them more accurate. In particular, in existing methods, the approximate VB posterior for the GMM states is fixed, while only the parameters of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Blind Source Separation Techniques