ERM-MinMaxGAP: Benchmarking and Mitigating Gender Bias in Multilingual Multimodal Speech-LLM Emotion Recognition
Zi Haur Pang, Xiaoxue Gao, Tatsuya Kawahara, Nancy F. Chen

TL;DR
This paper introduces a multilingual, multimodal benchmark for speech emotion recognition to analyze gender bias across languages and modalities, and proposes a new training method to mitigate this bias effectively.
Contribution
It presents a novel benchmark for multilingual, multimodal SER and a fairness-aware training approach called ERM-MinMaxGAP to reduce gender bias in speech emotion recognition systems.
Findings
Bias varies significantly across languages.
Multimodal fusion does not consistently improve fairness.
ERM-MinMaxGAP reduces gender bias gap and improves performance.
Abstract
Speech emotion recognition (SER) systems can exhibit gender-related performance disparities, but how such bias manifests in multilingual speech LLMs across languages and modalities is unclear. We introduce a novel multilingual, multimodal benchmark built on MELD-ST, spanning English, Japanese, and German, to quantify language-specific SER performance and gender gaps. We find bias is strongly language-dependent, and multimodal fusion does not reliably improve fairness. To address these, we propose ERM-MinMaxGAP, a fairness-informed training objective, which augments empirical risk minimization (ERM) with a proposed adaptive fairness weight mechanism and a novel MinMaxGAP regularizer on the maximum male-female loss gap within each language and modality. Building upon the Qwen2-Audio backbone, our ERM-MinMaxGAP approach improves multilingual SER performance by 5.5% and 5.0% while reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Voice and Speech Disorders
