Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires
Simon M\"unker

TL;DR
This study reveals that large language models fail to accurately reflect diverse cultural moral frameworks, homogenizing moral views across cultures and highlighting limitations in current AI alignment methods.
Contribution
It provides empirical evidence that state-of-the-art LLMs do not preserve cultural moral diversity, challenging their use in social science and calling for improved alignment strategies.
Findings
LLMs homogenize moral diversity across cultures
Increased model size does not improve cultural representation
Current AI alignment approaches are insufficient for capturing nuanced moral values
Abstract
Are AI systems truly representing human values, or merely averaging across them? Our study suggests a concerning reality: Large Language Models (LLMs) fail to represent diverse cultural moral frameworks despite their linguistic capabilities. We expose significant gaps between AI-generated and human moral intuitions by applying the Moral Foundations Questionnaire across 19 cultural contexts. Comparing multiple state-of-the-art LLMs' origins against human baseline data, we find these models systematically homogenize moral diversity. Surprisingly, increased model size doesn't consistently improve cultural representation fidelity. Our findings challenge the growing use of LLMs as synthetic populations in social science research and highlight a fundamental limitation in current AI alignment approaches. Without data-driven alignment beyond prompting, these systems cannot capture the nuanced,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
