The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Qutang Cai, Guoqiang Hong, Zhijian Ye, Ximin Li, Haizhou, Li

TL;DR
This paper presents the Kriston AI system for the VoxCeleb Speaker Recognition Challenge 2022, combining multiple models and techniques to achieve top rankings across different tracks with state-of-the-art performance.
Contribution
The system integrates various ResNet models, fine-tuned pre-trained models, and advanced clustering and diarisation techniques for improved speaker recognition accuracy.
Findings
Achieved 2nd place in all three tracks of VoxSRC-22.
Attained a minDCF of 0.090 and EER of 1.401% in track 1.
Achieved a diarisation error rate of 4.86% in track 4.
Abstract
This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). By combining several ResNet variants, our submission for track 1 attained a minDCF of 0:090 with EER 1:401%. By further incorporating three fine-tuned pre-trained models, our submission for track 2 achieved a minDCF of 0:072 with EER 1:119%. For track 4, our system consisted of voice activity detection (VAD), speaker embedding extraction, agglomerative hierarchical clustering (AHC) followed by a re-clustering step based on a Bayesian hidden Markov model and overlapped speech detection and handling. Our submission for track 4 achieved a diarisation error rate (DER) of 4.86%. The submissions all ranked the 2nd places for the corresponding tracks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Max Pooling · Residual Connection · Average Pooling · Residual Block · Bottleneck Residual Block · Convolution · Global Average Pooling
