Exploring Binary Classification Loss For Speaker Verification
Bing Han, Zhengyang Chen, Yanmin Qian

TL;DR
This paper introduces SphereFace2, a binary classifier-based framework for speaker verification that improves performance, robustness, and training-evaluation gap, especially on hard trials and noisy labels.
Contribution
It proposes a novel pair-wise binary classifier training paradigm for speaker verification, outperforming existing loss functions and enhancing robustness to label noise.
Findings
SphereFace2 outperforms existing loss functions on Voxceleb.
Large margin fine-tuning further improves SphereFace2.
SphereFace2 demonstrates robustness to noisy labels.
Abstract
The mismatch between close-set training and open-set testing usually leads to significant performance degradation for speaker verification task. For existing loss functions, metric learning-based objectives depend strongly on searching effective pairs which might hinder further improvements. And popular multi-classification methods are usually observed with degradation when evaluated on unseen speakers. In this work, we introduce SphereFace2 framework which uses several binary classifiers to train the speaker model in a pair-wise manner instead of performing multi-classification. Benefiting from this learning paradigm, it can efficiently alleviate the gap between training and evaluation. Experiments conducted on Voxceleb show that the SphereFace2 outperforms other existing loss functions, especially on hard trials. Besides, large margin fine-tuning strategy is proven to be compatible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
