Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang, Yulu Niki Pi, Carlos Mougan

TL;DR
This paper investigates how large language models can be influenced to change their moral judgments and align with ethical frameworks through prompting, revealing variability based on model size and scenario complexity.
Contribution
It introduces two experiments assessing LLMs' susceptibility to moral persuasion and ethical alignment, highlighting factors affecting their responsiveness.
Findings
LLMs can be persuaded in morally charged scenarios
Susceptibility varies with model size and scenario complexity
Different models from the same company show different responses
Abstract
We explore how large language models (LLMs) can be influenced by prompting them to alter their initial decisions and align them with established ethical frameworks. Our study is based on two experiments designed to assess the susceptibility of LLMs to moral persuasion. In the first experiment, we examine the susceptibility to moral ambiguity by evaluating a Base Agent LLM on morally ambiguous scenarios and observing how a Persuader Agent attempts to modify the Base Agent's initial decisions. The second experiment evaluates the susceptibility of LLMs to align with predefined ethical frameworks by prompting them to adopt specific value alignments rooted in established philosophical theories. The results demonstrate that LLMs can indeed be persuaded in morally charged scenarios, with the success of persuasion depending on factors such as the model used, the complexity of the scenario, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Ethics and Social Impacts of AI
MethodsBalanced Selection · ALIGN · ADaptive gradient method with the OPTimal convergence rate
