CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition

Yun-Shao Tsai; Yi-Cheng Lin; Huang-Cheng Chou; Hung-yi Lee

arXiv:2506.06071·eess.AS·November 17, 2025

CO-VADA: A Confidence-Oriented Voice Augmentation Debiasing Approach for Fair Speech Emotion Recognition

Yun-Shao Tsai, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee

PDF

Open Access

TL;DR

CO-VADA is a practical bias mitigation method for speech emotion recognition that uses voice conversion to generate diverse training samples, reducing demographic bias without changing model architecture or needing demographic labels.

Contribution

It introduces a scalable, model-agnostic approach that leverages voice conversion to create bias-diverse training data, enhancing fairness in SER systems.

Findings

01

Improves fairness across demographic groups in SER.

02

Compatible with various SER models and voice conversion tools.

03

Does not require demographic annotations or model modifications.

Abstract

Bias in speech emotion recognition (SER) systems often stems from spurious correlations between speaker characteristics and emotional labels, leading to unfair predictions across demographic groups. Many existing debiasing methods require model-specific changes or demographic annotations, limiting their practical use. We present CO-VADA, a Confidence-Oriented Voice Augmentation Debiasing Approach that mitigates bias without modifying model architecture or relying on demographic information. CO-VADA identifies training samples that reflect bias patterns present in the training data and then applies voice conversion to alter irrelevant attributes and generate samples. These augmented samples introduce speaker variations that differ from dominant patterns in the data, guiding the model to focus more on emotion-relevant features. Our framework is compatible with various SER models and voice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining

MethodsFocus