Evaluating the Moral Beliefs Encoded in LLMs

Nino Scherrer; Claudia Shi; Amir Feder; David M. Blei

arXiv:2307.14324·cs.CL·July 27, 2023·20 cites

Evaluating the Moral Beliefs Encoded in LLMs

Nino Scherrer, Claudia Shi, Amir Feder, David M. Blei

PDF

Open Access 1 Repo 2 Datasets 1 Video

TL;DR

This study develops a statistical survey method to analyze the moral beliefs encoded in large language models, revealing their tendencies and uncertainties in moral decision-making across various scenarios.

Contribution

It introduces a novel statistical approach for eliciting and evaluating moral beliefs in LLMs, especially in ambiguous situations, through large-scale surveys.

Findings

01

Models generally align with commonsense in clear scenarios.

02

Models show uncertainty in ambiguous moral questions.

03

Closed-source models tend to agree more with each other.

Abstract

This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ninodimontalcino/moralchoice
noneOfficial

Datasets

Videos

Evaluating the Moral Beliefs Encoded in LLMs· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management

MethodsALIGN