Clova Baseline System for the VoxCeleb Speaker Recognition Challenge   2020

Hee Soo Heo; Bong-Jin Lee; Jaesung Huh; Joon Son Chung

arXiv:2009.14153·eess.AS·September 30, 2020·97 cites

Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020

Hee Soo Heo, Bong-Jin Lee, Jaesung Huh, Joon Son Chung

PDF

Open Access

TL;DR

This paper presents a ResNet-based speaker recognition system for VoxCeleb Challenge 2020, demonstrating significant improvements over existing methods without ensemble or post-processing, and providing open-source code and models.

Contribution

It introduces optimized ResNet variants trained with various loss functions, achieving state-of-the-art results as a baseline for the challenge.

Findings

01

Significant performance improvements over previous works

02

Effective ResNet variants trained with different loss functions

03

Open-source code and pre-trained models provided

Abstract

This report describes our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020. We perform a careful analysis of speaker recognition models based on the popular ResNet architecture, and train a number of variants using a range of loss functions. Our results show significant improvements over most existing works without the use of model ensemble or post-processing. We release the training code and pre-trained models as unofficial baselines for this year's challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing