Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations
Jos\'e Luiz Nunes, Guilherme F. C. F. Almeida, Marcelo de Araujo,, Simone D. J. Barbosa

TL;DR
This study examines whether large language models like GPT-4 and Claude 2.1 exhibit moral hypocrisy by comparing their responses to abstract moral values and concrete moral scenarios using Moral Foundations Theory.
Contribution
The paper introduces a novel approach to assess LLMs' moral consistency by applying Moral Foundations Theory through two distinct research instruments.
Findings
LLMs show internal consistency within each instrument
Models exhibit contradictions between abstract values and concrete judgments
GPT-4 and Claude 2.1 display hypocritical moral behavior
Abstract
Large language models (LLMs) have taken centre stage in debates on Artificial Intelligence. Yet there remains a gap in how to assess LLMs' conformity to important human values. In this paper, we investigate whether state-of-the-art LLMs, GPT-4 and Claude 2.1 (Gemini Pro and LLAMA 2 did not generate valid results) are moral hypocrites. We employ two research instruments based on the Moral Foundations Theory: (i) the Moral Foundations Questionnaire (MFQ), which investigates which values are considered morally relevant in abstract moral judgements; and (ii) the Moral Foundations Vignettes (MFVs), which evaluate moral cognition in concrete scenarios related to each moral foundation. We characterise conflicts in values between these different abstractions of moral evaluation as hypocrisy. We found that both models displayed reasonable consistency within each instrument compared to humans,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout
