Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Jean de Dieu Nyandwi, Yueqi Song, Simran Khanuja, Graham Neubig

TL;DR
This paper introduces CulturalPangea, a culturally grounded multilingual multimodal model trained on a large, culturally-rich dataset, significantly improving performance on culture-specific tasks while maintaining general capabilities.
Contribution
The paper presents a novel data-centric approach using a large knowledge graph to create a diverse, multilingual, culturally-rich dataset for training MLLMs, leading to state-of-the-art results.
Findings
CulturalPangea outperforms prior models by 5.0 on culture-focused benchmarks.
The dataset CulturalGround contains 22 million culturally-rich VQA pairs.
The approach narrows the cultural gap in multilingual multimodal models.
Abstract
Multimodal Large Language Models excel in high-resource settings, but often misinterpret long-tail cultural entities and underperform in low-resource languages. To address this gap, we propose a data-centric approach that directly grounds MLLMs in cultural knowledge. Leveraging a large scale knowledge graph from Wikidata, we collect images that represent culturally significant entities, and generate synthetic multilingual visual question answering data. The resulting dataset, CulturalGround, comprises 22 million high-quality, culturally-rich VQA pairs spanning 42 countries and 39 languages. We train an open-source MLLM CulturalPangea on CulturalGround, interleaving standard multilingual instruction-tuning data to preserve general abilities. CulturalPangea achieves state-of-the-art performance among open models on various culture-focused multilingual multimodal benchmarks, outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
