From Local Concepts to Universals: Evaluating the Multicultural   Understanding of Vision-Language Models

Mehar Bhatia; Sahithya Ravi; Aditya Chinchure; Eunjeong Hwang; Vered; Shwartz

arXiv:2407.00263·cs.CL·July 2, 2024

From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models

Mehar Bhatia, Sahithya Ravi, Aditya Chinchure, Eunjeong Hwang, Vered, Shwartz

PDF

Open Access 1 Video

TL;DR

This paper introduces the GlobalRG benchmark to evaluate vision-language models' ability to understand and retrieve culturally diverse images, highlighting significant performance disparities across different cultures.

Contribution

It proposes a new benchmark with tasks for assessing both universal and culture-specific concepts, addressing limitations of previous cultural inclusivity tests.

Findings

01

Models show varied performance across cultures.

02

Current models underperform on non-western cultural concepts.

03

The benchmark reveals gaps in multicultural understanding.

Abstract

Despite recent advancements in vision-language models, their performance remains suboptimal on images from non-western cultures due to underrepresentation in training datasets. Various benchmarks have been proposed to test models' cultural inclusivity, but they have limited coverage of cultures and do not adequately assess cultural diversity across universal as well as culture-specific local concepts. To address these limitations, we introduce the GlobalRG benchmark, comprising two challenging tasks: retrieval across universals and cultural visual grounding. The former task entails retrieving culturally diverse images for universal concepts from 50 countries, while the latter aims at grounding culture-specific concepts within images from 15 countries. Our evaluation across a wide range of models reveals that the performance varies significantly across cultures -- underscoring the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models· underline

Taxonomy

TopicsReligious Tourism and Spaces