The ID R&D VoxCeleb Speaker Recognition Challenge 2023 System Description
Nikita Torgashov, Rostislav Makarov, Ivan Yakovlev, Pavel Malov,, Andrei Balykin, Anton Okhotnikov

TL;DR
This paper details ID R&D's winning system for VoxSRC-23, combining deep ResNets and self-supervised models trained on large datasets to achieve top speaker recognition performance.
Contribution
The paper introduces a fusion approach of deep ResNets and SSL models trained on extensive datasets, leading to state-of-the-art results in speaker recognition.
Findings
Achieved first place on VoxSRC-23 leaderboard
MinDCF of 0.0762, EER of 1.30%
Effective fusion of ResNets and SSL models
Abstract
This report describes ID R&D team submissions for Track 2 (open) to the VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23). Our solution is based on the fusion of deep ResNets and self-supervised learning (SSL) based models trained on a mixture of a VoxCeleb2 dataset and a large version of a VoxTube dataset. The final submission to the Track 2 achieved the first place on the VoxSRC-23 public leaderboard with a minDCF(0.05) of 0.0762 and EER of 1.30%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
