The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021

Miao Zhao; Yufeng Ma; Min Liu; Minqiang Xu

arXiv:2109.01989·cs.SD·September 7, 2021·36 cites

The SpeakIn System for VoxCeleb Speaker Recognition Challange 2021

Miao Zhao, Yufeng Ma, Min Liu, Minqiang Xu

PDF

Open Access 1 Repo

TL;DR

This paper presents a speaker recognition system for VoxCeleb Challenge 2021 that combines multiple models and techniques, achieving top performance with a fusion of nine models trained solely on VoxCeleb2-dev data.

Contribution

The report introduces a multi-model fusion approach with domain-based fine-tuning and back-end refinement, leading to state-of-the-art results in VoxCeleb speaker recognition.

Findings

01

Achieved first place in VoxSRC 2021 tracks 1 and 2.

02

Attained a minDCF of 0.1034 and EER of 1.8460%.

03

Demonstrated effectiveness of model fusion and domain adaptation techniques.

Abstract

This report describes our submission to the track 1 and track 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC 2021). Both track 1 and track 2 share the same speaker verification system, which only uses VoxCeleb2-dev as our training set. This report explores several parts, including data augmentation, network structures, domain-based large margin fine-tuning, and back-end refinement. Our system is a fusion of 9 models and achieves first place in these two tracks of VoxSRC 2021. The minDCF of our submission is 0.1034, and the corresponding EER is 1.8460%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaldi-asr/kaldi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing