Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans   and Language Models

Jan-Philipp Fr\"anken; Kanishk Gandhi; Tori Qiu; Ayesha Khawaja; Noah; D. Goodman; Tobias Gerstenberg

arXiv:2404.10975·cs.CL·April 18, 2024·1 cites

Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models

Jan-Philipp Fr\"anken, Kanishk Gandhi, Tori Qiu, Ayesha Khawaja, Noah, D. Goodman, Tobias Gerstenberg

PDF

Open Access 1 Repo

TL;DR

This paper introduces a framework for generating diverse moral dilemmas to evaluate moral reasoning in humans and AI, revealing patterns in judgments and highlighting areas for improvement in scenario design.

Contribution

It presents a novel procedural generation method for moral dilemmas and a large benchmark dataset for assessing moral reasoning in language models.

Findings

01

Both humans and models rated harm as less permissible when it was a necessary means.

02

Harmful outcomes that are evitable received higher permissibility ratings.

03

No significant difference based on whether harm resulted from action or omission.

Abstract

As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. We provide a framework that uses a language model to translate causal graphs that capture key aspects of moral dilemmas into prompt templates. With this framework, we procedurally generated a large and diverse set of moral dilemmas -- the OffTheRails benchmark -- consisting of 50 scenarios and 400 unique test items. We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two language models (GPT-4 and Claude-2) across eight conditions. We find that moral dilemmas in which the harm is a necessary means (as compared to a side effect)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cicl-stanford/moral-evals
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training