ChinaTelecom System Description to VoxCeleb Speaker Recognition Challenge 2023
Mengjie Du, Xiang Fang, Jie Li

TL;DR
This paper details ChinaTelecom's speaker recognition system for VoxCeleb2023, utilizing ResNet variants trained on VoxCeleb2, fused and calibrated to achieve competitive performance metrics.
Contribution
Introduction of a ResNet-based system with fusion and calibration techniques for improved speaker recognition in VoxCeleb2023 challenge.
Findings
Achieved minDCF of 0.1066
Achieved EER of 1.980%
System based on ResNet variants trained on VoxCeleb2
Abstract
This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later. Score calibration was also applied for each variant and the fused system. The final submission achieved minDCF of 0.1066 and EER of 1.980%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Residual Connection · Bottleneck Residual Block · Average Pooling · Convolution · Batch Normalization · Residual Block · Kaiming Initialization · Global Average Pooling
