Tongji University Undergraduate Team for the VoxCeleb Speaker Recognition Challenge2020
Shufan Shen, Ran Miao, Yi Wang, Zhihua Wei

TL;DR
This paper describes Tongji University's undergraduate team's submission to the VoxCeleb Speaker Recognition Challenge 2020, utilizing an enhanced ResNet34 model with denoising modules, data augmentation, and score fusion to improve speaker verification accuracy.
Contribution
The team applied the RSBU-CW denoising module to ResNet34 and used data augmentation and score fusion, achieving competitive results in the VoxCeleb challenge.
Findings
Achieved 0.2973 DCF and 4.97% EER on the challenge evaluation set.
Enhanced ResNet34 with RSBU-CW improves denoising in speaker recognition.
Fusion of two models boosts overall performance.
Abstract
In this report, we discribe the submission of Tongji University undergraduate team to the CLOSE track of the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020 at Interspeech 2020. We applied the RSBU-CW module to the ResNet34 framework to improve the denoising ability of the network and better complete the speaker verification task in a complex environment.We trained two variants of ResNet,used score fusion and data-augmentation methods to improve the performance of the model. Our fusion of two selected systems for the CLOSE track achieves 0.2973 DCF and 4.9700\% EER on the challenge evaluation set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
Methods1x1 Convolution · Average Pooling · Batch Normalization · Residual Connection · Residual Block · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Max Pooling · Convolution · Kaiming Initialization
