Moral Persuasion in Large Language Models: Evaluating Susceptibility and   Ethical Alignment

Allison Huang; Yulu Niki Pi; Carlos Mougan

arXiv:2411.11731·cs.CL·November 19, 2024

Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment

Allison Huang, Yulu Niki Pi, Carlos Mougan

PDF

Open Access 1 Repo

TL;DR

This paper investigates how large language models can be influenced to change their moral judgments and align with ethical frameworks through prompting, revealing variability based on model size and scenario complexity.

Contribution

It introduces two experiments assessing LLMs' susceptibility to moral persuasion and ethical alignment, highlighting factors affecting their responsiveness.

Findings

01

LLMs can be persuaded in morally charged scenarios

02

Susceptibility varies with model size and scenario complexity

03

Different models from the same company show different responses

Abstract

We explore how large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks. Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion. In the first experiment, we examine the susceptibility to moral ambiguity by evaluating a Base Agent LLM on morally ambiguous scenarios and observing how a Persuader Agent attempts to modify the Base Agent's initial decisions. The second experiment evaluates the susceptibility of LLMs to align with predefined ethical frameworks by prompting them to adopt specific value alignments rooted in established philosophical theories. The results demonstrate that LLMs can indeed be persuaded in morally charged scenarios, with the success of persuasion depending on factors such as the model used, the complexity of the scenario, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

acyhuang/moral-persuasion
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Ethics and Social Impacts of AI

MethodsBalanced Selection · ALIGN · ADaptive gradient method with the OPTimal convergence rate