Understanding Cultural Alignment in Multilingual LLMs via Natural Debate Statements
Vlad-Andrei Negru, Camelia Lemnaru, Mihai Surdeanu, Rodica Potolea

TL;DR
This paper introduces a new dataset and methodology to analyze sociocultural values in multilingual LLMs, revealing that models tend to reflect their country of development's norms rather than adapting to diverse user backgrounds.
Contribution
The paper presents a novel dataset, Sociocultural Statements, and a synthetic labeling approach to quantify and compare sociocultural norms in LLMs from different countries.
Findings
Culturally-distinct LLMs mirror their country of origin's norms.
Models show limited ability to adapt to diverse sociocultural backgrounds.
Human validation confirms the accuracy of synthetic labels.
Abstract
In this work we investigate the sociocultural values learned by large language models (LLMs). We introduce a novel open-access dataset, Sociocultural Statements, constructed from natural debate statements using a multi-step methodology. The dataset is synthetically labeled to enable the quantization of sociocultural norms and beliefs that LLMs exhibit in their responses to these statements, according to the Hofstede cultural dimensions. We verify the accuracy of synthetic labels using human quality control on a representative sample. We conduct a comparative analysis between two groups of LLMs developed in different countries (U.S. and China), and use as a comparative baseline patterns observed in human measurements. Using this new dataset and the analysis above, we found that culturally-distinct LLMs reflect the values and norms of the countries in which they were developed,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Explainable Artificial Intelligence (XAI) · Topic Modeling
