ChinaTelecom System Description to VoxCeleb Speaker Recognition   Challenge 2023

Mengjie Du; Xiang Fang; Jie Li

arXiv:2308.08181·cs.SD·August 17, 2023·1 cites

ChinaTelecom System Description to VoxCeleb Speaker Recognition Challenge 2023

Mengjie Du, Xiang Fang, Jie Li

PDF

Open Access

TL;DR

This paper details ChinaTelecom's speaker recognition system for VoxCeleb2023, utilizing ResNet variants trained on VoxCeleb2, fused and calibrated to achieve competitive performance metrics.

Contribution

Introduction of a ResNet-based system with fusion and calibration techniques for improved speaker recognition in VoxCeleb2023 challenge.

Findings

01

Achieved minDCF of 0.1066

02

Achieved EER of 1.980%

03

System based on ResNet variants trained on VoxCeleb2

Abstract

This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023). Our system consists of several ResNet variants trained only on VoxCeleb2, which were fused for better performance later. Score calibration was also applied for each variant and the fused system. The final submission achieved minDCF of 0.1066 and EER of 1.980%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Residual Connection · Bottleneck Residual Block · Average Pooling · Convolution · Batch Normalization · Residual Block · Kaiming Initialization · Global Average Pooling