XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021
Jie Wang, Fuchuang Tong, Zhicong Chen, Lin Li, Qingyang Hong, Haodong, Zhou

TL;DR
This paper presents the XMUSPEECH speaker recognition and diarisation systems for VoxCeleb Challenge 2021, highlighting the effectiveness of ResNet34-SE, ECAPA-TDNN, and a VAD module in improving speaker diarisation accuracy.
Contribution
The paper introduces a speaker recognition system with a VAD module that significantly enhances diarisation performance in the VoxCeleb Challenge.
Findings
DER of 5.54% on evaluation set
JER of 27.11% on evaluation set
DER of 2.92% on development set
Abstract
This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021. For track 2, we evaluate two systems including ResNet34-SE and ECAPA-TDNN. For track 4, an important part of our system is VAD module which greatly improves the performance. Our best submission on the track 4 obtained on the evaluation set DER 5.54% and JER 27.11%, while the performance on the development set is DER 2.92% and JER 20.84%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
