Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations

Sheng-Lun Wei; Yu-Ling Liao; Yen-Hua Chang; Hen-Hsen Huang; Hsin-Hsi Chen

arXiv:2602.01030·cs.CL·February 3, 2026

Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations

Sheng-Lun Wei, Yu-Ling Liao, Yen-Hua Chang, Hen-Hsen Huang, Hsin-Hsi Chen

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces the BiasInEar dataset and evaluates multilingual speech language models, revealing their sensitivity to language and structural biases while being relatively robust to demographic factors.

Contribution

It presents the first systematic speech bias assessment framework and a new multilingual speech benchmark for evaluating fairness and robustness in MLLMs.

Findings

01

Models are sensitive to language and option order biases.

02

Models show robustness to demographic factors like gender.

03

Architectural design influences robustness across languages.

Abstract

This work presents the first systematic investigation of speech bias in multilingual MLLMs. We construct and release the BiasInEar dataset, a speech-augmented benchmark based on Global MMLU Lite, spanning English, Chinese, and Korean, balanced by gender and accent, and totaling 70.8 hours ( $\approx$ 4,249 minutes) of speech with 11,200 questions. Using four complementary metrics (accuracy, entropy, APES, and Fleiss' $κ$ ), we evaluate nine representative models under linguistic (language and accent), demographic (gender), and structural (option order) perturbations. Our findings reveal that MLLMs are relatively robust to demographic factors but highly sensitive to language and option order, suggesting that speech can amplify existing structural biases. Moreover, architectural design and reasoning strategy substantially affect robustness across languages. Overall, this study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ntunlplab/BiasInEar
dataset· 31 dl
31 dl

Videos

Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing