Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020
Hee Soo Heo, Bong-Jin Lee, Jaesung Huh, Joon Son Chung

TL;DR
This paper presents a ResNet-based speaker recognition system for VoxCeleb Challenge 2020, demonstrating significant improvements over existing methods without ensemble or post-processing, and providing open-source code and models.
Contribution
It introduces optimized ResNet variants trained with various loss functions, achieving state-of-the-art results as a baseline for the challenge.
Findings
Significant performance improvements over previous works
Effective ResNet variants trained with different loss functions
Open-source code and pre-trained models provided
Abstract
This report describes our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020. We perform a careful analysis of speaker recognition models based on the popular ResNet architecture, and train a number of variants using a range of loss functions. Our results show significant improvements over most existing works without the use of model ensemble or post-processing. We release the training code and pre-trained models as unofficial baselines for this year's challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
