Speaker Verification By Partial AUC Optimization With Mahalanobis   Distance Metric Learning

Zhongxin Bai; Xiao-Lei Zhang; Jingdong Chen

arXiv:1902.00889·eess.AS·April 22, 2020·IEEE ACM Trans. Audio Speech Lang. Process.·1 cites

Speaker Verification By Partial AUC Optimization With Mahalanobis Distance Metric Learning

Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen

PDF

Open Access

TL;DR

This paper introduces a partial AUC optimization approach for speaker verification using Mahalanobis distance metric learning, enhancing performance by focusing on relevant ROC curve segments.

Contribution

It proposes a novel partial AUC optimization method with a Mahalanobis metric learning backend, including feature preprocessing techniques, for improved speaker verification accuracy.

Findings

01

Outperforms state-of-the-art back-ends on NIST SRE16 and SITW datasets.

02

Achieves better results across seven evaluation metrics.

03

Convex optimization guarantees a global optimum.

Abstract

Receiver operating characteristic (ROC) and detection error tradeoff (DET) curves are two widely used evaluation metrics for speaker verification. They are equivalent since the latter can be obtained by transforming the former's true positive y-axis to false negative y-axis and then re-scaling both axes by a probit operator. Real-world speaker verification systems, however, usually work on part of the ROC curve instead of the entire ROC curve given an application. Therefore, we propose in this paper to use the area under part of the ROC curve (pAUC) as a more efficient evaluation metric for speaker verification. A Mahalanobis distance metric learning based back-end is applied to optimize pAUC, where the Mahalanobis distance metric learning guarantees that the optimization objective of the back-end is a convex one so that the global optimum solution is achievable. To improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing