Demographic and Linguistic Bias Evaluation in Omnimodal Language Models

Alaa Elobaid

arXiv:2604.10014·cs.CV·April 14, 2026

Demographic and Linguistic Bias Evaluation in Omnimodal Language Models

Alaa Elobaid

PDF

TL;DR

This study evaluates demographic and linguistic biases in omnimodal language models across text, images, audio, and video, revealing significant biases especially in audio tasks and emphasizing the need for fairness assessments.

Contribution

It provides a comprehensive bias evaluation across multiple modalities in omnimodal models, highlighting disparities and areas needing improvement.

Findings

01

Image and video tasks show smaller demographic disparities.

02

Audio tasks exhibit larger biases and lower accuracy.

03

Biases vary significantly across demographic groups and modalities.

Abstract

This paper provides a comprehensive evaluation of demographic and linguistic biases in omnimodal language models that process text, images, audio, and video within a single framework. Although these models are being widely deployed, their performance across different demographic groups and modalities is not well studied. Four omnimodal models are evaluated on tasks that include demographic attribute estimation, identity verification, activity recognition, multilingual speech transcription, and language identification. Accuracy differences are measured across age, gender, skin tone, language, and country of origin. The results show that image and video understanding tasks generally exhibit better performance with smaller demographic disparities. In contrast, audio understanding tasks exhibit significantly lower performance and substantial bias, including large accuracy differences across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.