RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward
Zongsheng Wang, Kaili Sun, Bowen Wu, Qun Yu, Ying Li, Baoxun Wang

TL;DR
This paper introduces RAIDEN-R1, a reinforcement learning framework with verifiable rewards that significantly improves role consistency and reasoning in conversational agents, demonstrated by superior benchmark performance.
Contribution
The paper presents a novel reinforcement learning method with verifiable role-awareness rewards and a high-quality role-aware dataset, advancing role consistency in RPCAs.
Findings
Achieved over 88% accuracy on key metrics
Enhanced reasoning coherence and role consistency
Outperformed baseline models on RAIDEN benchmark
Abstract
Role-playing conversational agents (RPCAs) face persistent challenges in maintaining role consistency. To address this, we propose RAIDEN-R1, a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR). The method introduces both singular and multi-term mining strategies to generate quantifiable rewards by assessing role-specific keys. Additionally, we construct a high-quality, role-aware Chain-of-Thought dataset through multi-LLM collaboration, and implement experiments to enhance reasoning coherence. Experiments on the RAIDEN benchmark demonstrate RAIDEN-R1's superiority: our 14B-GRPO model achieves 88.04% and 88.65% accuracy on Script-Based Knowledge and Conversation Memory metrics, respectively, outperforming baseline models while maintaining robustness. Case analyses further reveal the model's enhanced ability to resolve conflicting contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Business Process Modeling and Analysis · Software System Performance and Reliability
