Normative Evaluation of Large Language Models with Everyday Moral Dilemmas
Pratik S. Sachdeva, Tom van Nuenen

TL;DR
This paper evaluates large language models on complex moral dilemmas from Reddit, revealing their varied and inconsistent moral judgments compared to humans, highlighting the need for nuanced ethical assessment of AI systems.
Contribution
It introduces a detailed evaluation framework for LLMs using real-world moral dilemmas, uncovering patterns and inconsistencies in their moral reasoning compared to human judgments.
Findings
LLMs show moderate to high self-consistency in moral judgments.
Models exhibit low agreement with each other and with human evaluations.
Distinct patterns in moral reasoning are identified across different models.
Abstract
The rapid adoption of large language models (LLMs) has spurred extensive research into their encoded moral norms and decision-making processes. Much of this research relies on prompting LLMs with survey-style questions to assess how well models are aligned with certain demographic groups, moral beliefs, or political ideologies. While informative, the adherence of these approaches to relatively superficial constructs tends to oversimplify the complexity and nuance underlying everyday moral dilemmas. We argue that auditing LLMs along more detailed axes of human interaction is of paramount importance to better assess the degree to which they may impact human beliefs and actions. To this end, we evaluate LLMs on complex, everyday moral dilemmas sourced from the "Am I the Asshole" (AITA) community on Reddit, where users seek moral judgments on everyday conflicts from other community members.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
