The IDLAB VoxCeleb Speaker Recognition Challenge 2021 System Description
Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

TL;DR
This paper describes a speaker recognition system for VoxCeleb Challenge 2021 that combines hybrid neural network architectures, specialized training techniques, and cross-lingual compensation to improve verification accuracy on challenging, short, and cross-lingual speech data.
Contribution
The authors introduce a fusion of ECAPA-TDNN and SE-ResNet models with enhancements like frequency positional encoding and challenging mini-batch sampling for improved speaker verification.
Findings
Achieved third place on VoxSRC-21 leaderboard.
Final system fusion reduced minDCF to approximately 0.13.
Enhanced models performed well on short and cross-lingual test conditions.
Abstract
This technical report describes the IDLab submission for track 1 and 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). This speaker verification competition focuses on short duration test recordings and cross-lingual trials. Currently, both Time Delay Neural Networks (TDNNs) and ResNets achieve state-of-the-art results in speaker verification. We opt to use a system fusion of hybrid architectures in our final submission. An ECAPA-TDNN baseline is enhanced with a 2D convolutional stem to transfer some of the strong characteristics of a ResNet based model to this hybrid CNN-TDNN architecture. Similarly, we incorporate absolute frequency positional information in the SE-ResNet architectures. All models are trained with a special mini-batch data sampling technique which constructs mini-batches with data that is the most challenging for the system on the level of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · Global Average Pooling · Residual Block · Max Pooling · 1x1 Convolution · Kaiming Initialization · Convolution · Residual Connection · Batch Normalization
