Tongji University Team for the VoxCeleb Speaker Recognition Challenge   2020

Rui Wang; Zhihua Wei; Yibin Zhan; Zhuoxi Chen

arXiv:2010.08179·eess.AS·October 19, 2020

Tongji University Team for the VoxCeleb Speaker Recognition Challenge 2020

Rui Wang, Zhihua Wei, Yibin Zhan, Zhuoxi Chen

PDF

Open Access

TL;DR

This paper details Tongji University's submission to the VoxCeleb Speaker Recognition Challenge 2020, exploring ResNet-34 based systems with data augmentation, loss functions, and score normalization, achieving competitive results.

Contribution

The team investigates multiple ResNet-34 variants with different loss functions and data augmentation techniques, and applies score normalization for improved speaker recognition performance.

Findings

01

Achieved 0.2800 DCF and 4.7770% EER on the challenge.

02

Demonstrated effectiveness of data augmentation and score normalization.

03

Fused five systems for optimal results.

Abstract

In this report, we describe the submission of Tongji University team to the CLOSE track of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020 at Interspeech 2020. We investigate different speaker recognition systems based on the popular ResNet-34 architecture, and train multiple variants via various loss functions. Both Offline and online data augmentation are introduced to improve the diversity of the training set, and score normalization with the exhaustive grid search is applied in the post-processing. Our best fusion of five selected systems for the CLOSE track achieves 0.2800 DCF and 4.7770% EER on the challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing