WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David, Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq, Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, Anirban Das, Ashmari, Pramodya, Aulia Adila, Bryan Wilie, Candy Olivia Mawalim

TL;DR
WorldCuisines is the largest multilingual multicultural VQA benchmark, assessing vision-language models' understanding of global cuisines across 30 languages, revealing current models' limitations in cultural and regional knowledge.
Contribution
Introduces the first large-scale, multicultural VQA benchmark with over 1 million data points across 30 languages, focusing on global cuisines and cultural knowledge.
Findings
VLMs perform better with location context
Models struggle with adversarial and regional cuisine questions
Knowledge base and data are released for future research
Abstract
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCulinary Culture and Tourism
