Demographic and Linguistic Bias Evaluation in Omnimodal Language Models
Alaa Elobaid

TL;DR
This study evaluates demographic and linguistic biases in omnimodal language models across text, images, audio, and video, revealing significant biases especially in audio tasks and emphasizing the need for fairness assessments.
Contribution
It provides a comprehensive bias evaluation across multiple modalities in omnimodal models, highlighting disparities and areas needing improvement.
Findings
Image and video tasks show smaller demographic disparities.
Audio tasks exhibit larger biases and lower accuracy.
Biases vary significantly across demographic groups and modalities.
Abstract
This paper provides a comprehensive evaluation of demographic and linguistic biases in omnimodal language models that process text, images, audio, and video within a single framework. Although these models are being widely deployed, their performance across different demographic groups and modalities is not well studied. Four omnimodal models are evaluated on tasks that include demographic attribute estimation, identity verification, activity recognition, multilingual speech transcription, and language identification. Accuracy differences are measured across age, gender, skin tone, language, and country of origin. The results show that image and video understanding tasks generally exhibit better performance with smaller demographic disparities. In contrast, audio understanding tasks exhibit significantly lower performance and substantial bias, including large accuracy differences across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
