ShaneRun System Description to VoxCeleb Speaker Recognition Challenge   2020

Shen Chen

arXiv:2011.01518·cs.SD·November 4, 2020·1 cites

ShaneRun System Description to VoxCeleb Speaker Recognition Challenge 2020

Shen Chen

PDF

Open Access

TL;DR

This paper details ShaneRun's speaker recognition system for VoxCeleb 2020, utilizing ResNet-34 embeddings and a novel fusion method, achieving improved performance over the baseline in the challenge.

Contribution

Introduction of a simple t-SNE based fusion method and application of ResNet-34 for speaker embedding extraction in VoxCeleb challenge.

Findings

01

Achieved 0.3098 minDCF, outperforming baseline by 1.3%.

02

Achieved 5.076% ERR, outperforming baseline by 2.2%.

03

Demonstrated effectiveness of t-SNE normalized distance fusion.

Abstract

In this report, we describe the submission of ShaneRun's team to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. We use ResNet-34 as encoder to extract the speaker embeddings, which is referenced from the open-source voxceleb-trainer. We also provide a simple method to implement optimum fusion using t-SNE normalized distance of testing utterance pairs instead of original negative Euclidean distance from the encoder. The final submitted system got 0.3098 minDCF and 5.076 % ERR for Fixed data track, which outperformed the baseline by 1.3 % minDCF and 2.2 % ERR respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques