CURE: Cultural Understanding and Reasoning Evaluation - A Framework for "Thick" Culture Alignment Evaluation in LLMs

Truong Vo; Sanmi Koyejo

arXiv:2511.12014·cs.CL·November 18, 2025

CURE: Cultural Understanding and Reasoning Evaluation - A Framework for "Thick" Culture Alignment Evaluation in LLMs

Truong Vo, Sanmi Koyejo

PDF

Open Access

TL;DR

This paper introduces CURE, a comprehensive framework for evaluating cultural understanding in large language models through realistic scenarios and multiple metrics, revealing limitations of current thin evaluations.

Contribution

It proposes a novel 'thick' evaluation framework with realistic contexts and multiple metrics, improving assessment of cultural reasoning in LLMs.

Findings

01

Thin evaluations overestimate cultural competence.

02

Thick evaluations reveal reasoning depth and reduce assessment variance.

03

Thick evaluation provides more stable and interpretable cultural understanding signals.

Abstract

Large language models (LLMs) are increasingly deployed in culturally diverse environments, yet existing evaluations of cultural competence remain limited. Existing methods focus on de-contextualized correctness or forced-choice judgments, overlooking the need for cultural understanding and reasoning required for appropriate responses. To address this gap, we introduce a set of benchmarks that, instead of directly probing abstract norms or isolated statements, present models with realistic situational contexts that require culturally grounded reasoning. In addition to the standard Exact Match metric, we introduce four complementary metrics (Coverage, Specificity, Connotation, and Coherence) to capture different dimensions of model's response quality. Empirical analysis across frontier models reveals that thin evaluation systematically overestimates cultural competence and produces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Psychometric Methodologies and Testing