UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning

Feng Zhang; Shijia Li; Chunmao Zhang; Zhanyu Ma; Jun Xu; Jiuchong Gao; Jinghua Hao; Renqing He; Jingwen Xu; Han Liu

arXiv:2601.09215·cs.CL·January 15, 2026

UserLM-R1: Modeling Human Reasoning in User Language Models with Multi-Reward Reinforcement Learning

Feng Zhang, Shijia Li, Chunmao Zhang, Zhanyu Ma, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He, Jingwen Xu, Han Liu

PDF

Open Access

TL;DR

UserLM-R1 is a new user language model that incorporates reasoning and adaptive profiles to improve negotiation and interaction in diverse scenarios, addressing limitations of static profiles and strategic neglect.

Contribution

It introduces a comprehensive user profile construction and a goal-driven decision-making policy refined through reinforcement learning, enhancing generalizability and strategic reasoning.

Findings

01

Outperforms baselines on challenging adversarial tasks

02

Demonstrates improved reasoning and strategic capabilities

03

Effective adaptation to diverse scenarios

Abstract

User simulators serve as the critical interactive environment for agent post-training, and an ideal user simulator generalizes across domains and proactively engages in negotiation by challenging or bargaining. However, current methods exhibit two issues. They rely on static and context-unaware profiles, necessitating extensive manual redesign for new scenarios, thus limiting generalizability. Moreover, they neglect human strategic thinking, leading to vulnerability to agent manipulation. To address these issues, we propose UserLM-R1, a novel user language model with reasoning capability. Specifically, we first construct comprehensive user profiles with both static roles and dynamic scenario-specific goals for adaptation to diverse scenarios. Then, we propose a goal-driven decision-making policy to generate high-quality rationales before producing responses, and further refine the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)