When to Make Exceptions: Exploring Language Models as Accounts of Human   Moral Judgment

Zhijing Jin; Sydney Levine; Fernando Gonzalez; Ojasv Kamal; Maarten; Sap; Mrinmaya Sachan; Rada Mihalcea; Josh Tenenbaum; Bernhard Sch\"olkopf

arXiv:2210.01478·cs.CL·October 28, 2022·25 cites

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment

Zhijing Jin, Sydney Levine, Fernando Gonzalez, Ojasv Kamal, Maarten, Sap, Mrinmaya Sachan, Rada Mihalcea, Josh Tenenbaum, Bernhard Sch\"olkopf

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces a new challenge set and a prompting strategy for large language models to better predict human moral judgments, especially in rule-breaking scenarios, enhancing AI safety.

Contribution

It presents a novel moral reasoning prompting method (MORALCOT) and a rule-breaking question answering dataset, improving LLM performance in modeling human moral flexibility.

Findings

01

MORALCOT outperforms seven LLMs by 6.2% F1

02

The approach better captures human moral reasoning in rule-breaking cases

03

Open-sourced dataset and code for further research

Abstract

AI systems are becoming increasingly intertwined with human life. In order to effectively collaborate with humans and ensure safety, AI systems need to be able to understand, interpret and predict human moral judgments and decisions. Human moral judgments are often guided by rules, but not always. A central challenge for AI safety is capturing the flexibility of the human moral mind -- the ability to determine when a rule should be broken, especially in novel or unusual situations. In this paper, we present a novel challenge set consisting of rule-breaking question answering (RBQA) of cases that involve potentially permissible rule-breaking -- inspired by recent moral psychology studies. Using a state-of-the-art large language model (LLM) as a basis, we propose a novel moral chain of thought (MORALCOT) prompting strategy that combines the strengths of LLMs with theories of moral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

feradauto/moralcot
noneOfficial

Datasets

feradauto/MoralExceptQA
dataset· 494 dl
494 dl

Videos

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI