CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts
Malvina Nikandrou, Georgios Pantazopoulos, Nikolas Vitsakis, Ioannis, Konstas, Alessandro Suglia

TL;DR
CROPE is a benchmark designed to evaluate how well vision and language models understand and adapt to culture-specific concepts, revealing current limitations in their cultural knowledge and multimodal integration capabilities.
Contribution
We introduce CROPE, a new benchmark for assessing cultural understanding in vision and language models, highlighting their challenges in utilizing contextual information for culture-specific concepts.
Findings
Models perform poorly on culture-specific concepts compared to common ones.
Current VLMs struggle to effectively use multimodal contextual information.
Significant performance gaps indicate limited cultural adaptability of existing models.
Abstract
As Vision and Language models (VLMs) are reaching users across the globe, assessing their cultural understanding has become a critical challenge. In this paper, we introduce CROPE, a visual question answering benchmark designed to probe the knowledge of culture-specific concepts and evaluate the capacity for cultural adaptation through contextual information. This allows us to distinguish between parametric knowledge acquired during training and contextual knowledge provided during inference via visual and textual descriptions. Our evaluation of several state-of-the-art open VLMs shows large performance disparities between culture-specific and common concepts in the parametric setting. Moreover, experiments with contextual knowledge indicate that models struggle to effectively utilize multimodal information and bind culture-specific concepts to their depictions. Our findings reveal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsLanguage, Metaphor, and Cognition
