Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Davi Bastos Costa; Felippe Alves; Renato Vicente

arXiv:2511.08565·cs.CL·May 15, 2026

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Davi Bastos Costa, Felippe Alves, Renato Vicente

PDF

TL;DR

This paper introduces a benchmark using the Moral Foundations Questionnaire to analyze how large language models respond morally when assuming different personas, revealing significant differences in robustness and susceptibility across models.

Contribution

It presents a novel benchmark and analysis method for measuring moral susceptibility and robustness in LLMs under persona role-play, highlighting the influence of training stages and model families.

Findings

01

Moral robustness varies greatly across model families, with Claude being most robust.

02

Moral susceptibility shows little variation across models and is likely influenced by pre-training.

03

Family dependence explains most of the variance in robustness, but not susceptibility.

Abstract

Large language models (LLMs) increasingly operate in social contexts, motivating analysis of how they express and shift moral judgments. In this work, we investigate the moral response of LLMs to persona role-play, prompting a LLM to assume a specific character. Using the Moral Foundations Questionnaire (MFQ), we introduce a benchmark that quantifies two properties: moral susceptibility and moral robustness, defined from the variability of MFQ scores across- and within-personas. We estimate these quantities with two complementary procedures, repeated sampling and a logit-based method that directly estimates the rating distributions and enables temperature analysis. We evaluate 15 models across six families: Claude, DeepSeek, Gemini, GPT, Grok, and Llama. The two metrics show qualitatively different patterns. Moral robustness varies by more than an order of magnitude, with a coefficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.