VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge

Arsha Nagrani; Joon Son Chung; Jaesung Huh; Andrew Brown; Ernesto; Coto; Weidi Xie; Mitchell McLaren; Douglas A Reynolds; Andrew Zisserman

arXiv:2012.06867·cs.SD·December 15, 2020·65 cites

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge

Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto, Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman

PDF

Open Access 1 Models

TL;DR

The VoxSRC 2020 challenge evaluated the state-of-the-art in speaker recognition and diarisation using unconstrained YouTube data, providing datasets, benchmarks, and insights into progress since the previous challenge.

Contribution

This paper introduces the second VoxCeleb Speaker Recognition Challenge, including new datasets, evaluation protocols, and a comparison of current methods in unconstrained speaker recognition.

Findings

01

Significant improvements over the first challenge in speaker recognition accuracy

02

Effective baseline systems established for diarisation and recognition tasks

03

Insights into current challenges and future directions in speaker recognition

Abstract

We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2020. This paper outlines the challenge, and describes the baselines, methods used, and results. We conclude with a discussion of the progress over the first installment of the challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
bigstorm/case-speaker-embedding-v2-512
model· 5 dl· ♡ 1
5 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing