Evaluation of Cultural Competence of Vision-Language Models

Srishti Yadav; Lauren Tilton; Maria Antoniak; Taylor Arnold; Jiaang Li; Siddhesh Milind Pawar; Antonia Karamolegkou; Stella Frank; Zhaochong An; Negar Rostamzadeh; Daniel Hershcovich; Serge Belongie; Ekaterina Shutova

arXiv:2505.22793·cs.CV·August 15, 2025

Evaluation of Cultural Competence of Vision-Language Models

Srishti Yadav, Lauren Tilton, Maria Antoniak, Taylor Arnold, Jiaang Li, Siddhesh Milind Pawar, Antonia Karamolegkou, Stella Frank, Zhaochong An, Negar Rostamzadeh, Daniel Hershcovich, Serge Belongie, Ekaterina Shutova

PDF

TL;DR

This paper highlights the limitations of current vision-language models in understanding cultural nuances and proposes a comprehensive framework based on visual culture studies to evaluate their cultural competence.

Contribution

It introduces five new frameworks derived from cultural studies to systematically analyze and improve VLMs' cultural understanding.

Findings

01

Current VLMs lack comprehensive cultural competence

02

Proposed frameworks enable systematic cultural analysis of images

03

Foundation for future improvements in culturally aware VLMs

Abstract

Modern vision-language models (VLMs) often fail at cultural competency evaluations and benchmarks. Given the diversity of applications built upon VLMs, there is renewed interest in understanding how they encode cultural nuances. While individual aspects of this problem have been studied, we still lack a comprehensive framework for systematically identifying and annotating the nuanced cultural dimensions present in images for VLMs. This position paper argues that foundational methodologies from visual culture studies (cultural studies, semiotics, and visual studies) are necessary for cultural analysis of images. Building upon this review, we propose a set of five frameworks, corresponding to cultural dimensions, that must be considered for a more complete analysis of the cultural competencies of VLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training