Max-margin Metric Learning for Speaker Recognition

Lantian Li; Dong Wang; Chao Xing; Thomas Fang Zheng

arXiv:1510.05940·cs.SD·April 1, 2016

Max-margin Metric Learning for Speaker Recognition

Lantian Li, Dong Wang, Chao Xing, Thomas Fang Zheng

PDF

Open Access

TL;DR

This paper introduces a max-margin metric learning method for speaker recognition that directly optimizes the discriminative margin between true and imposter trials, outperforming traditional PLDA in experiments.

Contribution

It proposes a novel max-margin metric learning approach that learns a linear transform for speaker recognition, addressing PLDA's Gaussian assumption and task-specific objective limitations.

Findings

01

Achieves comparable or better performance than PLDA on SRE08

02

Uses a simple cosine scoring method with improved results

03

Addresses Gaussian assumption limitations of PLDA

Abstract

Probabilistic linear discriminant analysis (PLDA) is a popular normalization approach for the i-vector model, and has delivered state-of-the-art performance in speaker recognition. A potential problem of the PLDA model, however, is that it essentially assumes Gaussian distributions over speaker vectors, which is not always true in practice. Additionally, the objective function is not directly related to the goal of the task, e.g., discriminating true speakers and imposters. In this paper, we propose a max-margin metric learning approach to solve the problems. It learns a linear transform with a criterion that the margin between target and imposter trials are maximized. Experiments conducted on the SRE08 core test show that compared to PLDA, the new approach can obtain comparable or even better performance, though the scoring is simply a cosine computation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing