The SpeakIn Speaker Verification System for Far-Field Speaker   Verification Challenge 2022

Yu Zheng; Jinghan Peng; Yihao Chen; Yajun Zhang; Jialong Wang; Min; Liu; Minqiang Xu

arXiv:2209.11625·cs.SD·September 26, 2022·1 cites

The SpeakIn Speaker Verification System for Far-Field Speaker Verification Challenge 2022

Yu Zheng, Jinghan Peng, Yihao Chen, Yajun Zhang, Jialong Wang, Min, Liu, Minqiang Xu

PDF

Open Access

TL;DR

This paper presents the SpeakIn speaker verification system for the FFSVC2022 challenge, utilizing advanced neural architectures, novel transfer learning, and fusion techniques to achieve top-ranked results in far-field speaker verification.

Contribution

The paper introduces a staged transfer learning method and effective model fusion strategies that significantly improve far-field speaker verification performance.

Findings

01

Achieved 3.0049% EER in Task 1

02

Achieved 6.2060% EER in Task 2

03

Ranked 1st in both challenge tasks

Abstract

This paper describes speaker verification (SV) systems submitted by the SpeakIn team to the Task 1 and Task 2 of the Far-Field Speaker Verification Challenge 2022 (FFSVC2022). SV tasks of the challenge focus on the problem of fully supervised far-field speaker verification (Task 1) and semi-supervised far-field speaker verification (Task 2). In Task 1, we used the VoxCeleb and FFSVC2020 datasets as train datasets. And for Task 2, we only used the VoxCeleb dataset as train set. The ResNet-based and RepVGG-based architectures were developed for this challenge. Global statistic pooling structure and MQMHA pooling structure were used to aggregate the frame-level features across time to obtain utterance-level representation. We adopted AM-Softmax and AAM-Softmax to classify the resulting embeddings. We innovatively propose a staged transfer learning method. In the pre-training stage we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques