HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing
Chengyu Du, Xintao Wang, Aili Chen, Weiyuan Li, Rui Xu, Junteng Liu, Zishan Huang, Rong Tian, Zijun Sun, Yuhao Li, Liheng Feng, Deming Ding, Pengyu Zhao, Yanghua Xiao

TL;DR
HER introduces a cognitive simulation framework for LLM role-playing, utilizing dual-layer thinking, curated reasoning data, and human-aligned rewards to enhance persona consistency and reasoning capabilities.
Contribution
The paper presents HER, a novel approach combining reasoning-augmented data and dual-layer thinking for improved LLM role-playing with human-like cognition.
Findings
HER models outperform baseline Qwen3-32B significantly.
30.26 point improvement on CoSER benchmark.
14.97% gain on Minimax Role-Play Bench.
Abstract
LLM role-playing, i.e., using LLMs to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation and digital games. While current models effectively capture character tones and knowledge, simulating the inner thoughts behind their behaviors remains a challenge. Towards cognitive simulation in LLM role-play, previous efforts mainly suffer from two deficiencies: lacking data with high-quality reasoning traces, and lacking reliable reward signals aligned with human preferences. In this paper, we propose HER, a unified framework for cognitive-level persona simulation. HER introduces dual-layer thinking, which distinguishes characters' first-person thinking from LLMs' third-person thinking. To bridge these gaps, we curate reasoning-augmented role-playing data via reverse engineering, and construct human-aligned principles and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
