The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
Basile Garcia, Crystal Qian, Stefano Palminteri

TL;DR
This paper investigates how well large language models align with human morals in decision-making, revealing biases and differences in perception through a large-scale study and linguistic analysis.
Contribution
It introduces a large corpus of human and LLM responses to moral dilemmas and systematically analyzes human-LLM moral alignment and perception biases.
Findings
LLMs and humans both reject complex utilitarian dilemmas
Participants prefer LLMs' moral assessments but show anti-AI bias
Linguistic differences affect detection and agreement with responses
Abstract
As large language models (LLMs) become increasingly integrated into society, their alignment with human morals is crucial. To better understand this alignment, we created a large corpus of human- and LLM-generated responses to various moral scenarios. We found a misalignment between human and LLM moral assessments; although both LLMs and humans tended to reject morally complex utilitarian dilemmas, LLMs were more sensitive to personal framing. We then conducted a quantitative user study involving 230 participants (N=230), who evaluated these responses by determining whether they were AI-generated and assessed their agreement with the responses. Human evaluators preferred LLMs' assessments in moral scenarios, though a systematic anti-AI bias was observed: participants were less likely to agree with judgments they believed to be machine-generated. Statistical and NLP-based analyses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Psychology of Moral and Emotional Judgment
