The SpeakIn System Description for CNSRC2022

Yu Zheng; Yihao Chen; Jinghan Peng; Yajun Zhang; Min Liu; Minqiang Xu

arXiv:2209.10846·cs.SD·September 23, 2022·1 cites

The SpeakIn System Description for CNSRC2022

Yu Zheng, Yihao Chen, Jinghan Peng, Yajun Zhang, Min Liu, Minqiang Xu

PDF

Open Access

TL;DR

This paper describes a speaker verification system developed for CNSRC 2022, utilizing multiple neural architectures and advanced pooling and scoring techniques, achieving top rankings in several challenge tasks.

Contribution

The report introduces a multi-architecture speaker verification system with novel pooling and fine-tuning strategies that significantly improve performance in the CNSRC 2022 challenge.

Findings

01

Achieved 1st place in open track of SV task

02

Secured 2nd place in fixed track of SV task

03

Placed 3rd in SR task

Abstract

This report describes our speaker verification systems for the tasks of the CN-Celeb Speaker Recognition Challenge 2022 (CNSRC 2022). This challenge includes two tasks, namely speaker verification(SV) and speaker retrieval(SR). The SV task involves two tracks: fixed track and open track. In the fixed track, we only used CN-Celeb.T as the training set. For the open track of the SV task and SR task, we added our open-source audio data. The ResNet-based, RepVGG-based, and TDNN-based architectures were developed for this challenge. Global statistic pooling structure and MQMHA pooling structure were used to aggregate the frame-level features across time to obtain utterance-level representation. We adopted AM-Softmax and AAM-Softmax combined with the Sub-Center method to classify the resulting embeddings. We also used the Large-Margin Fine-Tuning strategy to further improve the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques