Are Language Models Sensitive to Morally Irrelevant Distractors?
Andrew Shaw, Christina Hahn, Catherine Rasgaitis, Yash Mishra, Alisa Liu, Natasha Jaques, Yulia Tsvetkov, Amy X. Zhang

TL;DR
This paper investigates whether large language models are influenced by morally irrelevant situational factors, revealing that such distractors can significantly alter their moral judgments, similar to human cognitive biases.
Contribution
The study introduces a novel multimodal dataset of moral distractors and demonstrates their impact on LLM moral judgments, highlighting the need for more nuanced evaluation methods.
Findings
Moral distractors can shift LLM judgments by over 30%.
LLMs exhibit sensitivity to morally irrelevant situational factors.
Current benchmarks may not fully capture LLMs' moral reasoning.
Abstract
With the rapid development and uptake of large language models (LLMs) across high-stakes settings, it is increasingly important to ensure that LLMs behave in ways that align with human values. Existing moral benchmarks prompt LLMs with value statements, moral scenarios, or psychological questionnaires, with the implicit underlying assumption that LLMs report somewhat stable moral preferences. However, moral psychology research has shown that human moral judgements are sensitive to morally irrelevant situational factors, such as smelling cinnamon rolls or the level of ambient noise, thereby challenging moral theories that assume the stability of human moral judgements. Here, we draw inspiration from this "situationist" view of moral psychology to evaluate whether LLMs exhibit similar cognitive moral biases to humans. We curate a novel multimodal dataset of 60 "moral distractors" from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychology of Moral and Emotional Judgment · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
