XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs
Mohsinul Kabir, Tasnim Ahmed, Md Mezbaur Rahman, Shaoxiong Ji, Hassan Alhuzali, Sophia Ananiadou

TL;DR
XCR-Bench introduces a comprehensive benchmark with parallel cross-cultural sentences to evaluate and analyze LLMs' ability to understand and adapt to diverse cultural contexts, revealing significant weaknesses and biases.
Contribution
The paper presents a new benchmark dataset, XCR-Bench, with 4.9k parallel sentences and 1,098 CSIs, integrating cultural frameworks for systematic evaluation of LLMs' cultural reasoning capabilities.
Findings
LLMs struggle with social etiquette and cultural references.
LLMs encode regional and ethno-religious biases.
State-of-the-art models show consistent weaknesses in cultural reasoning.
Abstract
Cross-cultural competence in large language models (LLMs) requires the ability to identify Culture-Specific Items (CSIs) and to adapt them appropriately across cultural contexts. Progress in evaluating this capability has been constrained by the scarcity of high-quality CSI-annotated corpora with parallel cross-cultural sentence pairs. To address this limitation, we introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark consisting of 4.9k parallel sentences and 1,098 unique CSIs, spanning three distinct reasoning tasks with corresponding evaluation metrics. Our corpus integrates Newmark's CSI framework with Hall's Triad of Culture, enabling systematic analysis of cultural reasoning beyond surface-level artifacts and into semi-visible and invisible cultural elements such as social norms, beliefs, and values. Our findings show that state-of-the-art LLMs exhibit consistent weaknesses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Topic Modeling · Computational and Text Analysis Methods
