Differences in the Moral Foundations of Large Language Models
Peter Kirgis

TL;DR
This paper investigates how large language models differ from humans and each other in moral judgments based on moral foundations theory, revealing increasing divergence with model capability.
Contribution
It applies moral foundations theory to analyze LLMs, highlighting biases and differences from human moral judgments and among models themselves.
Findings
Models rely on different moral foundations from humans.
Differences in moral judgments increase with model capabilities.
Models show bias and variance relative to human baseline.
Abstract
Large language models are increasingly being used in critical domains of politics, business, and education, but the nature of their normative ethical judgment remains opaque. Alignment research has, to date, not sufficiently utilized perspectives and insights from the field of moral psychology to inform training and evaluation of frontier models. I perform a synthetic experiment on a wide range of models from most major model providers using Jonathan Haidt's influential moral foundations theory (MFT) to elicit diverse value judgments from LLMs. Using multiple descriptive statistical approaches, I document the bias and variance of large language model responses relative to a human baseline in the original survey. My results suggest that models rely on different moral foundations from one another and from a nationally representative human baseline, and these differences increase as model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Computational and Text Analysis Methods · Explainable Artificial Intelligence (XAI)
