Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models

Zahraa Al Sahili; Ioannis Patras; Matthew Purver

arXiv:2505.14160·cs.CL·November 20, 2025

Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models

Zahraa Al Sahili, Ioannis Patras, Matthew Purver

PDF

Open Access

TL;DR

This study systematically audits four multilingual vision-language models, revealing that multilinguality often amplifies biases, especially in low-resource and gendered languages, highlighting the need for nuanced bias evaluation.

Contribution

It provides the first comprehensive bias analysis of multiple multilingual VLMs across ten languages, uncovering how multilinguality influences stereotype propagation and bias amplification.

Findings

01

All models exhibit stronger gender bias than English-only baselines.

02

Biases are amplified in low-resource and highly gendered languages.

03

Shared encoders transfer stereotypes across languages, increasing bias.

Abstract

Multilingual vision-language models (VLMs) promise universal image-text retrieval, yet their social biases remain underexplored. We perform the first systematic audit of four public multilingual CLIP variants: M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, and the debiased SigLIP-2, covering ten languages that differ in resource availability and morphological gender marking. Using balanced subsets of FairFace and the PATA stereotype suite in a zero-shot setting, we quantify race and gender bias and measure stereotype amplification. Contrary to the intuition that multilinguality mitigates bias, every model exhibits stronger gender skew than its English-only baseline. CAPIVARA-CLIP shows its largest biases precisely in the low-resource languages it targets, while the shared encoder of NLLB-CLIP and SigLIP-2 transfers English gender stereotypes into gender-neutral languages; loosely coupled encoders…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCategorization, perception, and language · Language, Metaphor, and Cognition

MethodsContrastive Language-Image Pre-training