GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge   Benchmarking

Florian Schneider; Carolin Holtermann; Chris Biemann; Anne Lauscher

arXiv:2502.13766·cs.CL·February 20, 2025

GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking

Florian Schneider, Carolin Holtermann, Chris Biemann, Anne Lauscher

PDF

Open Access 3 Datasets

TL;DR

This paper introduces GIMMICK, a comprehensive benchmark for evaluating large vision-language models' understanding of diverse cultural knowledge across 144 countries, revealing biases and performance patterns.

Contribution

GIMMICK is the first extensive multimodal benchmark assessing cultural knowledge across many countries and cultural aspects, enabling systematic evaluation of LVLMs.

Findings

01

Models show strong Western cultural biases.

02

Model size correlates with performance.

03

Multimodal input improves cultural understanding.

Abstract

Large Vision-Language Models (LVLMs) have recently gained attention due to their distinctive performance and broad applicability. While it has been previously shown that their efficacy in usage scenarios involving non-Western contexts falls short, existing studies are limited in scope, covering just a narrow range of cultures, focusing exclusively on a small number of cultural aspects, or evaluating a limited selection of models on a single task only. Towards globally inclusive LVLM research, we introduce GIMMICK, an extensive multimodal benchmark designed to assess a broad spectrum of cultural knowledge across 144 countries representing six global macro-regions. GIMMICK comprises six tasks built upon three new datasets that span 728 unique cultural events or facets on which we evaluated 20 LVLMs and 11 LLMs, including five proprietary and 26 open-weight models of all sizes. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems

MethodsSoftmax · Attention Is All You Need