Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions
Santi Prieto, Alfonso Ortega, Iv\'an L\'opez-Espejo, Eduardo Lleida

TL;DR
This paper introduces a linear compensation method using Gaussian mixture models to improve speaker verification accuracy across different vocal effort conditions, such as shouted versus normal speech.
Contribution
It presents a novel application of GMM-based compensation techniques from speech recognition to address vocal effort mismatch in speaker verification systems.
Findings
Up to 13.8% EER relative improvement with compensation
Effective shouted speech detection using logistic regression
Back-end compensation enhances robustness to vocal effort variations
Abstract
The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
