Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs

Mohsinul Kabir; Ajwad Abrar; Sophia Ananiadou

arXiv:2502.08045·cs.CL·November 14, 2025

Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs

Mohsinul Kabir, Ajwad Abrar, Sophia Ananiadou

PDF

Open Access

TL;DR

This paper critiques the prevalent closed-choice survey methods for evaluating cultural alignment in LLMs, demonstrating that more flexible, unconstrained approaches reveal stronger alignment and expose limitations of traditional evaluations.

Contribution

It introduces unconstrained evaluation methods using real-world data, highlighting the variability and limitations of closed-style assessments for cultural alignment in LLMs.

Findings

01

LLMs show stronger cultural alignment in unconstrained settings

02

Minor survey changes cause inconsistent LLM responses

03

Closed-style evaluations have significant limitations

Abstract

A large number of studies rely on closed-style multiple-choice surveys to evaluate cultural alignment in Large Language Models (LLMs). In this work, we challenge this constrained evaluation paradigm and explore more realistic, unconstrained approaches. Using the World Values Survey (WVS) and Hofstede Cultural Dimensions as case studies, we demonstrate that LLMs exhibit stronger cultural alignment in less constrained settings, where responses are not forced. Additionally, we show that even minor changes, such as reordering survey choices, lead to inconsistent outputs, exposing the limitations of closed-style evaluations. Our findings advocate for more robust and flexible evaluation frameworks that focus on specific cultural proxies, encouraging more nuanced and accurate assessments of cultural alignment in LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Interpreting and Communication in Healthcare · Translation Studies and Practices

MethodsFocus