VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto, Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman

TL;DR
The VoxSRC 2020 challenge evaluated the state-of-the-art in speaker recognition and diarisation using unconstrained YouTube data, providing datasets, benchmarks, and insights into progress since the previous challenge.
Contribution
This paper introduces the second VoxCeleb Speaker Recognition Challenge, including new datasets, evaluation protocols, and a comparison of current methods in unconstrained speaker recognition.
Findings
Significant improvements over the first challenge in speaker recognition accuracy
Effective baseline systems established for diarisation and recognition tasks
Insights into current challenges and future directions in speaker recognition
Abstract
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020. The goal of this challenge was to assess how well current speaker recognition technology is able to diarise and recognize speakers in unconstrained or `in the wild' data. It consisted of: (i) a publicly available speaker recognition and diarisation dataset from YouTube videos together with ground truth annotation and standardised evaluation software; and (ii) a virtual public challenge and workshop held at Interspeech 2020. This paper outlines the challenge, and describes the baselines, methods used, and results. We conclude with a discussion of the progress over the first installment of the challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
