ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency
Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang, Zhang, Junjie Li

TL;DR
This paper introduces ERes2NetV2, a computationally efficient model that significantly improves short-duration speaker verification accuracy by enhanced feature fusion and pruning techniques, outperforming previous models on VoxCeleb datasets.
Contribution
The paper proposes ERes2NetV2, an improved version of ERes2Net with expanded channels and pruning, achieving better short-duration speaker verification performance with reduced complexity.
Findings
Achieves 0.98% EER on 3s trials
Reduces model parameters and computational cost
Outperforms previous models on VoxCeleb datasets
Abstract
Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion demonstrates sub-optimal performance in short-duration speaker verification. To further improve the short-duration feature extraction capability of ERes2Net, we expand the channel dimension within each stage. However, this modification also increases the number of model parameters and computational complexity. To alleviate this problem, we propose an improved ERes2NetV2 by pruning redundant structures, ultimately reducing both the model parameters and its computational cost. A range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Residual Connection · Res2Net Block · Kaiming Initialization · Average Pooling · Global Average Pooling · Batch Normalization · Convolution · Res2Net
