Speaker Verification By Partial AUC Optimization With Mahalanobis Distance Metric Learning
Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen

TL;DR
This paper introduces a partial AUC optimization approach for speaker verification using Mahalanobis distance metric learning, enhancing performance by focusing on relevant ROC curve segments.
Contribution
It proposes a novel partial AUC optimization method with a Mahalanobis metric learning backend, including feature preprocessing techniques, for improved speaker verification accuracy.
Findings
Outperforms state-of-the-art back-ends on NIST SRE16 and SITW datasets.
Achieves better results across seven evaluation metrics.
Convex optimization guarantees a global optimum.
Abstract
Receiver operating characteristic (ROC) and detection error tradeoff (DET) curves are two widely used evaluation metrics for speaker verification. They are equivalent since the latter can be obtained by transforming the former's true positive y-axis to false negative y-axis and then re-scaling both axes by a probit operator. Real-world speaker verification systems, however, usually work on part of the ROC curve instead of the entire ROC curve given an application. Therefore, we propose in this paper to use the area under part of the ROC curve (pAUC) as a more efficient evaluation metric for speaker verification. A Mahalanobis distance metric learning based back-end is applied to optimize pAUC, where the Mahalanobis distance metric learning guarantees that the optimization objective of the back-end is a convex one so that the global optimum solution is achievable. To improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
