Learning Metrics from Mean Teacher: A Supervised Learning Method for Improving the Generalization of Speaker Verification System
Ju-ho Kim, Hye-jin Shim, Jee-weon Jung, and Ha-Jin Yu

TL;DR
This paper introduces a supervised learning method using the Mean Teacher model to enhance speaker verification systems' ability to generalize to unseen speakers by producing more stable and discriminative speaker embeddings.
Contribution
The paper applies the Mean Teacher approach to speaker verification, demonstrating improved generalization and embedding stability over traditional methods.
Findings
11.61% relative performance improvement on VoxCeleb1
Mean Teacher produces more accurate speaker embeddings
Enhanced discrimination between speakers
Abstract
Most speaker verification tasks are studied as an open-set evaluation scenario considering the real-world condition. Thus, the generalization power to unseen speakers is of paramount important to the performance of the speaker verification system. We propose to apply \textit {Mean Teacher}, a temporal averaging model, to extract speaker embeddings with small intra-class variance and large inter-class variance. The mean teacher network refers to the temporal averaging of deep neural network parameters; it can produces more accurate and stable representations than using weights after the training finished. By learning the reliable intermediate representation of the mean teacher network, we expect that the proposed method can explore more discriminatory embedding spaces and improve the generalization performance of the speaker verification system. Experimental results on the VoxCeleb1 test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
