GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection
Zhenchun Lei, Hui Yan, Changhong Liu, Yong Zhou, Minglei Ma

TL;DR
GMM-ResNet2 introduces an ensemble of group ResNet networks with multi-scale GMM features and an ensemble-aware loss for improved synthetic speech detection, outperforming previous models on benchmark datasets.
Contribution
The paper presents GMM-ResNet2, a novel ensemble model with multi-scale GMM features, group classification, and an ensemble-aware loss, advancing synthetic speech detection accuracy.
Findings
Achieves state-of-the-art results on ASVspoof 2019 and 2021 datasets.
Reduces EER and t-DCF significantly compared to baseline models.
Demonstrates the effectiveness of multi-scale GMM features and ensemble loss in detection tasks.
Abstract
Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scale Log Gaussian Probability features. Secondly, the grouping technique is used to improve the classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The final score is obtained by ensemble of all group classifier outputs using the averaging method. Thirdly, the residual block is improved by including one activation function and one batch normalization layer. Finally, an ensemble-aware loss function is proposed to integrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
