Improving speaker verification robustness with synthetic emotional utterances
Nikhil Kumar Koditala, Chelsea Jui-Ting Ju, Ruirui Li, Minho Jin, Aman, Chadha, Andreas Stolcke

TL;DR
This paper introduces a CycleGAN-based data augmentation method to generate synthetic emotional speech, enhancing speaker verification systems' robustness across emotional states and reducing error rates.
Contribution
The study presents a novel use of CycleGAN for synthesizing emotional speech data, improving speaker verification accuracy in emotional scenarios.
Findings
Synthetic emotional data improves verification accuracy
Reduced equal error rate by up to 3.64%
Enhanced robustness across emotional speech variations
Abstract
A speaker verification (SV) system offers an authentication service designed to confirm whether a given speech sample originates from a specific speaker. This technology has paved the way for various personalized applications that cater to individual preferences. A noteworthy challenge faced by SV systems is their ability to perform consistently across a range of emotional spectra. Most existing models exhibit high error rates when dealing with emotional utterances compared to neutral ones. Consequently, this phenomenon often leads to missing out on speech of interest. This issue primarily stems from the limited availability of labeled emotional speech data, impeding the development of robust speaker representations that encompass diverse emotional states. To address this concern, we propose a novel approach employing the CycleGAN framework to serve as a data augmentation method. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · travel james · Batch Normalization · GAN Least Squares Loss · Cycle Consistency Loss · Residual Connection · Residual Block · Sigmoid Activation · Convolution · PatchGAN
