Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs
Mohsinul Kabir, Ajwad Abrar, Sophia Ananiadou

TL;DR
This paper critiques the prevalent closed-choice survey methods for evaluating cultural alignment in LLMs, demonstrating that more flexible, unconstrained approaches reveal stronger alignment and expose limitations of traditional evaluations.
Contribution
It introduces unconstrained evaluation methods using real-world data, highlighting the variability and limitations of closed-style assessments for cultural alignment in LLMs.
Findings
LLMs show stronger cultural alignment in unconstrained settings
Minor survey changes cause inconsistent LLM responses
Closed-style evaluations have significant limitations
Abstract
A large number of studies rely on closed-style multiple-choice surveys to evaluate cultural alignment in Large Language Models (LLMs). In this work, we challenge this constrained evaluation paradigm and explore more realistic, unconstrained approaches. Using the World Values Survey (WVS) and Hofstede Cultural Dimensions as case studies, we demonstrate that LLMs exhibit stronger cultural alignment in less constrained settings, where responses are not forced. Additionally, we show that even minor changes, such as reordering survey choices, lead to inconsistent outputs, exposing the limitations of closed-style evaluations. Our findings advocate for more robust and flexible evaluation frameworks that focus on specific cultural proxies, encouraging more nuanced and accurate assessments of cultural alignment in LLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Interpreting and Communication in Healthcare · Translation Studies and Practices
MethodsFocus
