GMM-ResNext: Combining Generative and Discriminative Models for Speaker   Verification

Hui Yan; Zhenchun Lei; Changhong Liu; Yong Zhou

arXiv:2407.03135·cs.SD·July 4, 2024

GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

Hui Yan, Zhenchun Lei, Changhong Liu, Yong Zhou

PDF

TL;DR

This paper introduces GMM-ResNext, a hybrid deep learning model combining generative GMMs and discriminative ResNext architectures, significantly improving speaker verification accuracy over existing models.

Contribution

It proposes a novel GMM-ResNext model that integrates generative and discriminative approaches for enhanced speaker verification performance.

Findings

01

Achieves 48.1% relative EER reduction over ResNet34.

02

Achieves 11.3% relative EER reduction over ECAPA-TDNN.

03

Effective use of log Gaussian probability features improves model accuracy.

Abstract

With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consider the score distribution of each frame feature over all Gaussian components and ignores the relationship between neighboring speech frames. So, we extract the log Gaussian probability features based on the raw acoustic features and use ResNext-based network as the backbone to extract the speaker embedding. GMM-ResNext combines Generative and Discriminative Models to improve the generalization ability of deep learning models and allows one to more easily specify meaningful priors on model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.