The Moral Mind(s) of Large Language Models
Avner Seror

TL;DR
This paper investigates whether large language models exhibit a consistent moral structure by analyzing their responses to ethical dilemmas, revealing shared core principles and notable variations in moral reasoning across models.
Contribution
The study introduces a novel application of revealed preference theory to assess moral preferences in nearly 40 LLMs, providing a new framework for evaluating ethical alignment.
Findings
Most models exhibit behavior consistent with stable moral preferences.
Models cluster around neutral moral stances with some variation.
Shared core in moral reasoning exists alongside meaningful heterogeneity.
Abstract
As large language models (LLMs) increasingly participate in tasks with ethical and societal stakes, a critical question arises: do they exhibit an emergent "moral mind" - a consistent structure of moral preferences guiding their decisions - and to what extent is this structure shared across models? To investigate this, we applied tools from revealed preference theory to nearly 40 leading LLMs, presenting each with many structured moral dilemmas spanning five foundational dimensions of ethical reasoning. Using a probabilistic rationality test, we found that at least one model from each major provider exhibited behavior consistent with approximately stable moral preferences, acting as if guided by an underlying utility function. We then estimated these utility functions and found that most models cluster around neutral moral stances. To further characterize heterogeneity, we employed a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsSparse Evolutionary Training
