Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Zihao Yi; Qingxuan Jiang; Ruotian Ma; Xingyu Chen; Qu Yang; Mengru Wang; Fanghua Ye; Ying Shen; Zhaopeng Tu; Xiaolong Li; Linus

arXiv:2511.04962·cs.CL·November 13, 2025

Too Good to be Bad: On the Failure of LLMs to Role-Play Villains

Zihao Yi, Qingxuan Jiang, Ruotian Ma, Xingyu Chen, Qu Yang, Mengru Wang, Fanghua Ye, Ying Shen, Zhaopeng Tu, Xiaolong Li, Linus

PDF

Open Access 1 Datasets

TL;DR

This paper investigates the difficulty of large language models in authentically role-playing morally villainous characters due to safety alignment constraints, revealing a decline in fidelity as characters become more malevolent.

Contribution

The introduction of the Moral RolePlay benchmark and systematic evaluation of LLMs' ability to portray villains, highlighting safety alignment's impact on creative role-playing.

Findings

01

Models struggle with traits like deceit and manipulation.

02

Safety-aligned models perform poorly in villain role-playing.

03

Fidelity declines as moral alignment becomes more villainous.

Abstract

Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic personas remains largely unexamined. We hypothesize that the safety alignment of modern LLMs creates a fundamental conflict with the task of authentically role-playing morally ambiguous or villainous characters. To investigate this, we introduce the Moral RolePlay benchmark, a new dataset featuring a four-level moral alignment scale and a balanced test set for rigorous evaluation. We task state-of-the-art LLMs with role-playing characters from moral paragons to pure villains. Our large-scale evaluation reveals a consistent, monotonic decline in role-playing fidelity as character morality decreases. We find that models struggle most with traits directly antithetical to safety principles, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Zihao1/Moral-RolePlay
dataset· 37 dl
37 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Multimodal Machine Learning Applications