Multimodal Cultural Safety: Evaluation Framework and Alignment Strategies
Haoyi Qiu, Kung-Hsiang Huang, Ruichen Zheng, Jiao Sun, Nanyun Peng

TL;DR
This paper introduces CROSS, a comprehensive benchmark and evaluation framework for assessing and improving the cultural safety of large vision-language models across multiple languages and regions.
Contribution
It presents a new culturally grounded benchmark, CROSS, and an intercultural evaluation framework, CROSS-Eval, along with strategies to enhance models' cultural safety and alignment.
Findings
Significant gaps in cultural safety awareness and compliance among current LVLMs.
Open-source models can match GPT-4o's performance with proper tuning.
Fine-tuning and preference tuning substantially improve cultural safety metrics.
Abstract
Large vision-language models (LVLMs) are increasingly deployed in globally distributed applications, such as tourism assistants, yet their ability to produce culturally appropriate responses remains underexplored. Existing multimodal safety benchmarks primarily focus on physical safety and overlook violations rooted in cultural norms, which can result in symbolic harm. To address this gap, we introduce CROSS, a benchmark designed to assess the cultural safety reasoning capabilities of LVLMs. CROSS includes 1,284 multilingual visually grounded queries from 16 countries, three everyday domains, and 14 languages, where cultural norm violations emerge only when images are interpreted in context. We propose CROSS-Eval, an intercultural theory-based framework that measures four key dimensions: cultural awareness, norm education, compliance, and helpfulness. Using this framework, we evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Language, Metaphor, and Cognition · Discourse Analysis in Language Studies
MethodsFocus
