See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal, Kathleen McKeown

TL;DR
This paper investigates how cultural biases influence vision-language models' image understanding, revealing Western bias and the importance of diverse language representation in reducing such biases.
Contribution
It characterizes cultural bias in VLMs, identifies language diversity as a key factor, and demonstrates bias reduction through well-represented languages during training.
Findings
VLMs perform better on Western images than East Asian images.
Language diversity in training reduces cultural bias in models.
Bias can be mitigated even when prompting in English if the language was well-represented during training.
Abstract
Vision-language models (VLMs) can respond to queries about images in many languages. However, beyond language, culture affects how we see things. For example, individuals from Western cultures focus more on the central figure in an image while individuals from East Asian cultures attend more to scene context. In this work, we characterize the Western bias of VLMs in image understanding and investigate the role that language plays in this disparity. We evaluate VLMs across subjective and objective visual tasks with culturally diverse images and annotations. We find that VLMs perform better on the Western split than on the East Asian split of each task. Through controlled experimentation, we trace one source of this bias in image understanding to the lack of diversity in language model construction. While inference in a language nearer to a culture can lead to reductions in bias, we show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsLanguage, Metaphor, and Cognition
MethodsFocus
