The Moral Mind(s) of Large Language Models

Avner Seror

arXiv:2412.04476·cs.CY·April 28, 2025

The Moral Mind(s) of Large Language Models

Avner Seror

PDF

Open Access

TL;DR

This paper investigates whether large language models exhibit a consistent moral structure by analyzing their responses to ethical dilemmas, revealing shared core principles and notable variations in moral reasoning across models.

Contribution

The study introduces a novel application of revealed preference theory to assess moral preferences in nearly 40 LLMs, providing a new framework for evaluating ethical alignment.

Findings

01

Most models exhibit behavior consistent with stable moral preferences.

02

Models cluster around neutral moral stances with some variation.

03

Shared core in moral reasoning exists alongside meaningful heterogeneity.

Abstract

As large language models (LLMs) increasingly participate in tasks with ethical and societal stakes, a critical question arises: do they exhibit an emergent "moral mind" - a consistent structure of moral preferences guiding their decisions - and to what extent is this structure shared across models? To investigate this, we applied tools from revealed preference theory to nearly 40 leading LLMs, presenting each with many structured moral dilemmas spanning five foundational dimensions of ethical reasoning. Using a probabilistic rationality test, we found that at least one model from each major provider exhibited behavior consistent with approximately stable moral preferences, acting as if guided by an underlying utility function. We then estimated these utility functions and found that most models cluster around neutral moral stances. To further characterize heterogeneity, we employed a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods

MethodsSparse Evolutionary Training