Improving speaker verification robustness with synthetic emotional   utterances

Nikhil Kumar Koditala; Chelsea Jui-Ting Ju; Ruirui Li; Minho Jin; Aman; Chadha; Andreas Stolcke

arXiv:2412.00319·cs.SD·December 3, 2024

Improving speaker verification robustness with synthetic emotional utterances

Nikhil Kumar Koditala, Chelsea Jui-Ting Ju, Ruirui Li, Minho Jin, Aman, Chadha, Andreas Stolcke

PDF

Open Access

TL;DR

This paper introduces a CycleGAN-based data augmentation method to generate synthetic emotional speech, enhancing speaker verification systems' robustness across emotional states and reducing error rates.

Contribution

The study presents a novel use of CycleGAN for synthesizing emotional speech data, improving speaker verification accuracy in emotional scenarios.

Findings

01

Synthetic emotional data improves verification accuracy

02

Reduced equal error rate by up to 3.64%

03

Enhanced robustness across emotional speech variations

Abstract

A speaker verification (SV) system offers an authentication service designed to confirm whether a given speech sample originates from a specific speaker. This technology has paved the way for various personalized applications that cater to individual preferences. A noteworthy challenge faced by SV systems is their ability to perform consistently across a range of emotional spectra. Most existing models exhibit high error rates when dealing with emotional utterances compared to neutral ones. Consequently, this phenomenon often leads to missing out on speech of interest. This issue primarily stems from the limited availability of labeled emotional speech data, impeding the development of robust speaker representations that encompass diverse emotional states. To address this concern, we propose a novel approach employing the CycleGAN framework to serve as a data augmentation method. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

Methods*Communicated@Fast*How Do I Communicate to Expedia? · travel james · Batch Normalization · GAN Least Squares Loss · Cycle Consistency Loss · Residual Connection · Residual Block · Sigmoid Activation · Convolution · PatchGAN