XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021

Jie Wang; Fuchuang Tong; Zhicong Chen; Lin Li; Qingyang Hong; Haodong; Zhou

arXiv:2109.02549·eess.AS·September 7, 2021

XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021

Jie Wang, Fuchuang Tong, Zhicong Chen, Lin Li, Qingyang Hong, Haodong, Zhou

PDF

Open Access

TL;DR

This paper presents the XMUSPEECH speaker recognition and diarisation systems for VoxCeleb Challenge 2021, highlighting the effectiveness of ResNet34-SE, ECAPA-TDNN, and a VAD module in improving speaker diarisation accuracy.

Contribution

The paper introduces a speaker recognition system with a VAD module that significantly enhances diarisation performance in the VoxCeleb Challenge.

Findings

01

DER of 5.54% on evaluation set

02

JER of 27.11% on evaluation set

03

DER of 2.92% on development set

Abstract

This paper describes the XMUSPEECH speaker recognition and diarisation systems for the VoxCeleb Speaker Recognition Challenge 2021. For track 2, we evaluate two systems including ResNet34-SE and ECAPA-TDNN. For track 4, an important part of our system is VAD module which greatly improves the performance. Our best submission on the track 4 obtained on the evaluation set DER 5.54% and JER 27.11%, while the performance on the development set is DER 2.92% and JER 20.84%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing