Tongji University Team for the VoxCeleb Speaker Recognition Challenge 2020
Rui Wang, Zhihua Wei, Yibin Zhan, Zhuoxi Chen

TL;DR
This paper details Tongji University's submission to the VoxCeleb Speaker Recognition Challenge 2020, exploring ResNet-34 based systems with data augmentation, loss functions, and score normalization, achieving competitive results.
Contribution
The team investigates multiple ResNet-34 variants with different loss functions and data augmentation techniques, and applies score normalization for improved speaker recognition performance.
Findings
Achieved 0.2800 DCF and 4.7770% EER on the challenge.
Demonstrated effectiveness of data augmentation and score normalization.
Fused five systems for optimal results.
Abstract
In this report, we describe the submission of Tongji University team to the CLOSE track of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020 at Interspeech 2020. We investigate different speaker recognition systems based on the popular ResNet-34 architecture, and train multiple variants via various loss functions. Both Offline and online data augmentation are introduced to improve the diversity of the training set, and score normalization with the exhaustive grid search is applied in the post-processing. Our best fusion of five selected systems for the CLOSE track achieves 0.2800 DCF and 4.7770% EER on the challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
