The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge   2022

Xiaoyi Qin; Na Li; Yuke Lin; Yiwei Ding; Chao Weng; Dan Su; Ming Li

arXiv:2210.05092·cs.SD·October 12, 2022·6 cites

The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022

Xiaoyi Qin, Na Li, Yuke Lin, Yiwei Ding, Chao Weng, Dan Su, Ming Li

PDF

Open Access

TL;DR

This paper describes the DKU-Tencent system for VoxCeleb Speaker Recognition Challenge 2022, focusing on cross-age speaker recognition and semi-supervised domain adaptation, achieving competitive results with innovative calibration and adaptation techniques.

Contribution

The system introduces cross-age calibration with QMF and a semi-supervised domain adaptation method using pseudo labels and Sub-center ArcFace, advancing speaker recognition performance.

Findings

01

Achieved 0.107 mDCF in track1

02

Achieved 7.135% EER in track3

03

Effective use of quality measures and domain adaptation techniques

Abstract

This paper is the system description of the DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC22). In this challenge, we focus on track1 and track3. For track1, multiple backbone networks are adopted to extract frame-level features. Since track1 focus on the cross-age scenarios, we adopt the cross-age trials and perform QMF to calibrate score. The magnitude-based quality measures achieve a large improvement. For track3, the semi-supervised domain adaptation task, the pseudo label method is adopted to make domain adaptation. Considering the noise labels in clustering, the ArcFace is replaced by Sub-center ArcFace. The final submission achieves 0.107 mDCF in task1 and 7.135% EER in task3.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing