The ID R&D VoxCeleb Speaker Recognition Challenge 2023 System   Description

Nikita Torgashov; Rostislav Makarov; Ivan Yakovlev; Pavel Malov,; Andrei Balykin; Anton Okhotnikov

arXiv:2308.08294·eess.AS·August 22, 2023

The ID R&D VoxCeleb Speaker Recognition Challenge 2023 System Description

Nikita Torgashov, Rostislav Makarov, Ivan Yakovlev, Pavel Malov,, Andrei Balykin, Anton Okhotnikov

PDF

Open Access

TL;DR

This paper details ID R&D's winning system for VoxSRC-23, combining deep ResNets and self-supervised models trained on large datasets to achieve top speaker recognition performance.

Contribution

The paper introduces a fusion approach of deep ResNets and SSL models trained on extensive datasets, leading to state-of-the-art results in speaker recognition.

Findings

01

Achieved first place on VoxSRC-23 leaderboard

02

MinDCF of 0.0762, EER of 1.30%

03

Effective fusion of ResNets and SSL models

Abstract

This report describes ID R&D team submissions for Track 2 (open) to the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). Our solution is based on the fusion of deep ResNets and self-supervised learning (SSL) based models trained on a mixture of a VoxCeleb2 dataset and a large version of a VoxTube dataset. The final submission to the Track 2 achieved the first place on the VoxSRC-23 public leaderboard with a minDCF(0.05) of 0.0762 and EER of 1.30%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing