
TL;DR
This study evaluates how Claude's constitutional AI reflects cultural biases by comparing its responses to global value data, revealing that it mirrors dominant cultural perspectives and may reinforce biases.
Contribution
It provides empirical evidence that constitutional AI models tend to encode prevailing cultural biases, highlighting limitations of current alignment methods.
Findings
Claude's responses resemble Northern European and Anglophone countries' values.
Cultural context influences rhetorical framing but not core values.
Removing system prompts increases refusals without changing value responses.
Abstract
Constitutional AI (CAI) aligns language models with explicitly stated normative principles, offering a transparent alternative to implicit alignment through human feedback alone. However, because constitutions are authored by specific groups of people, the resulting models may reflect particular cultural perspectives. We investigate this question by evaluating Anthropic's Claude Sonnet on 55 World Values Survey items, selected for high cross-cultural variance across six value domains and administered as both direct survey questions and naturalistic advice-seeking scenarios. Comparing Claude's responses to country-level data from 90 nations, we find that Claude's value profile most closely resembles those of Northern European and Anglophone countries, but on a majority of items extends beyond the range of all surveyed populations. When users provide cultural context, Claude adjusts its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
