CROPE: Evaluating In-Context Adaptation of Vision and Language Models to   Culture-Specific Concepts

Malvina Nikandrou; Georgios Pantazopoulos; Nikolas Vitsakis; Ioannis; Konstas; Alessandro Suglia

arXiv:2410.15453·cs.CL·February 7, 2025

CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts

Malvina Nikandrou, Georgios Pantazopoulos, Nikolas Vitsakis, Ioannis, Konstas, Alessandro Suglia

PDF

Open Access 1 Repo 1 Video

TL;DR

CROPE is a benchmark designed to evaluate how well vision and language models understand and adapt to culture-specific concepts, revealing current limitations in their cultural knowledge and multimodal integration capabilities.

Contribution

We introduce CROPE, a new benchmark for assessing cultural understanding in vision and language models, highlighting their challenges in utilizing contextual information for culture-specific concepts.

Findings

01

Models perform poorly on culture-specific concepts compared to common ones.

02

Current VLMs struggle to effectively use multimodal contextual information.

03

Significant performance gaps indicate limited cultural adaptability of existing models.

Abstract

As Vision and Language models (VLMs) are reaching users across the globe, assessing their cultural understanding has become a critical challenge. In this paper, we introduce CROPE, a visual question answering benchmark designed to probe the knowledge of culture-specific concepts and evaluate the capacity for cultural adaptation through contextual information. This allows us to distinguish between parametric knowledge acquired during training and contextual knowledge provided during inference via visual and textual descriptions. Our evaluation of several state-of-the-art open VLMs shows large performance disparities between culture-specific and common concepts in the parametric setting. Moreover, experiments with contextual knowledge indicate that models struggle to effectively utilize multimodal information and bind culture-specific concepts to their depictions. Our findings reveal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MalvinaNikandrou/crope
noneOfficial

Videos

CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts· underline

Taxonomy

TopicsLanguage, Metaphor, and Cognition