Do Large Language Models Truly Understand Cross-cultural Differences?

Shiwei Guo; Sihang Jiang; Qianxi He; Yanghua Xiao; Jiaqing Liang; Bi Yude; Minggui He; Shimin Tao; Li Zhang

arXiv:2512.07075·cs.CL·December 9, 2025

Do Large Language Models Truly Understand Cross-cultural Differences?

Shiwei Guo, Sihang Jiang, Qianxi He, Yanghua Xiao, Jiaqing Liang, Bi Yude, Minggui He, Shimin Tao, Li Zhang

PDF

Open Access

TL;DR

This paper introduces SAGE, a comprehensive benchmark for evaluating large language models' ability to understand and reason about cross-cultural differences, addressing existing evaluation gaps.

Contribution

We propose SAGE, a scenario-based benchmark grounded in cultural theory, with curated concepts and test items to assess LLMs' cross-cultural understanding and reasoning.

Findings

01

LLMs show systematic weaknesses in cross-cultural reasoning.

02

SAGE benchmark is transferable across languages.

03

Models still lack nuanced cross-cultural understanding.

Abstract

In recent years, large language models (LLMs) have demonstrated strong performance on multilingual tasks. Given its wide range of applications, cross-cultural understanding capability is a crucial competency. However, existing benchmarks for evaluating whether LLMs genuinely possess this capability suffer from three key limitations: a lack of contextual scenarios, insufficient cross-cultural concept mapping, and limited deep cultural reasoning capabilities. To address these gaps, we propose SAGE, a scenario-based benchmark built via cross-cultural core concept alignment and generative task design, to evaluate LLMs' cross-cultural understanding and reasoning. Grounded in cultural theory, we categorize cross-cultural capabilities into nine dimensions. Using this framework, we curated 210 core concepts and constructed 4530 test items across 15 specific real-world scenarios, organized under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Computational and Text Analysis Methods