The Fragility Of Moral Judgment In Large Language Models

Tom van Nuenen; Pratik S. Sachdeva

arXiv:2603.05651·cs.CL·March 9, 2026

The Fragility Of Moral Judgment In Large Language Models

Tom van Nuenen, Pratik S. Sachdeva

PDF

Open Access

TL;DR

This study investigates the stability of large language models' moral judgments by applying various content perturbations, revealing significant fragility influenced by narrative voice and presentation, which raises concerns about reproducibility and fairness.

Contribution

Introduces a perturbation framework to test LLM moral judgment stability, highlighting the influence of narrative cues and presentation on moral assessments.

Findings

01

Surface noise causes minimal judgment flips (~7.5%)

02

Perspective shifts significantly increase instability (~24.3%)

03

Judgments are heavily influenced by narrative voice and presentation

Abstract

People increasingly use large language models (LLMs) for everyday moral and interpersonal guidance, yet these systems cannot interrogate missing context and judge dilemmas as presented. We introduce a perturbation framework for testing the stability and manipulability of LLM moral judgments while holding the underlying moral conflict constant. Using 2,939 dilemmas from r/AmItheAsshole (January-March 2025), we generate three families of content perturbations: surface edits (lexical/structural noise), point-of-view shifts (voice and stance neutralization), and persuasion cues (self-positioning, social proof, pattern admissions, victim framing). We also vary the evaluation protocol (output ordering, instruction placement, and unstructured prompting). We evaluated all variants with four models (GPT-4.1, Claude 3.7 Sonnet, DeepSeek V3, Qwen2.5-72B) (N=129,156 judgments). Surface…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education