The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021
Miao Zhao, Yufeng Ma, Min Liu, Minqiang Xu

TL;DR
This paper presents a speaker recognition system for VoxCeleb Challenge 2021 that combines multiple models and techniques, achieving top performance with a fusion of nine models trained solely on VoxCeleb2-dev data.
Contribution
The report introduces a multi-model fusion approach with domain-based fine-tuning and back-end refinement, leading to state-of-the-art results in VoxCeleb speaker recognition.
Findings
Achieved first place in VoxSRC 2021 tracks 1 and 2.
Attained a minDCF of 0.1034 and EER of 1.8460%.
Demonstrated effectiveness of model fusion and domain adaptation techniques.
Abstract
This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same speaker verification system, which only uses VoxCeleb2-dev as our training set. This report explores several parts, including data augmentation, network structures, domain-based large margin fine-tuning, and back-end refinement. Our system is a fusion of 9 models and achieves first place in these two tracks of VoxSRC 2021. The minDCF of our submission is 0.1034, and the corresponding EER is 1.8460%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
