HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

Chengyu Du; Xintao Wang; Aili Chen; Weiyuan Li; Rui Xu; Junteng Liu; Zishan Huang; Rong Tian; Zijun Sun; Yuhao Li; Liheng Feng; Deming Ding; Pengyu Zhao; Yanghua Xiao

arXiv:2601.21459·cs.LG·April 30, 2026

HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing

Chengyu Du, Xintao Wang, Aili Chen, Weiyuan Li, Rui Xu, Junteng Liu, Zishan Huang, Rong Tian, Zijun Sun, Yuhao Li, Liheng Feng, Deming Ding, Pengyu Zhao, Yanghua Xiao

PDF

3 Models 1 Datasets

TL;DR

HER introduces a cognitive simulation framework for LLM role-playing, utilizing dual-layer thinking, curated reasoning data, and human-aligned rewards to enhance persona consistency and reasoning capabilities.

Contribution

The paper presents HER, a novel approach combining reasoning-augmented data and dual-layer thinking for improved LLM role-playing with human-like cognition.

Findings

01

HER models outperform baseline Qwen3-32B significantly.

02

30.26 point improvement on CoSER benchmark.

03

14.97% gain on Minimax Role-Play Bench.

Abstract

LLM role-playing, i.e., using LLMs to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a challenge. Towards cognitive simulation in LLM role-play, previous efforts mainly suffer from two deficiencies: lacking data with high-quality reasoning traces, and lacking reliable reward signals aligned with human preferences. In this paper, we propose HER, a unified framework for cognitive-level persona simulation. HER introduces dual-layer thinking, which distinguishes characters' first-person thinking from LLMs' third-person thinking. To bridge these gaps, we curate reasoning-augmented role-playing data via reverse engineering, and construct human-aligned principles and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

ChengyuDu0123/HER-Dataset
dataset· 525 dl
525 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.