GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
Hui Yan, Zhenchun Lei, Changhong Liu, Yong Zhou

TL;DR
This paper introduces GMM-ResNext, a hybrid deep learning model combining generative GMMs and discriminative ResNext architectures, significantly improving speaker verification accuracy over existing models.
Contribution
It proposes a novel GMM-ResNext model that integrates generative and discriminative approaches for enhanced speaker verification performance.
Findings
Achieves 48.1% relative EER reduction over ResNet34.
Achieves 11.3% relative EER reduction over ECAPA-TDNN.
Effective use of log Gaussian probability features improves model accuracy.
Abstract
With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consider the score distribution of each frame feature over all Gaussian components and ignores the relationship between neighboring speech frames. So, we extract the log Gaussian probability features based on the raw acoustic features and use ResNext-based network as the backbone to extract the speaker embedding. GMM-ResNext combines Generative and Discriminative Models to improve the generalization ability of deep learning models and allows one to more easily specify meaningful priors on model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
