Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment

Weixiang Zhao; Xingyu Sui; Yulin Hu; Jiahe Guo; Haixiao Liu; Biye Li; Yanyan Zhao; Bing Qin; Ting Liu

arXiv:2505.15456·cs.CL·December 12, 2025

Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment

Weixiang Zhao, Xingyu Sui, Yulin Hu, Jiahe Guo, Haixiao Liu, Biye Li, Yanyan Zhao, Bing Qin, Ting Liu

PDF

Open Access

TL;DR

This paper introduces RLPA, a reinforcement learning framework that enables large language models to dynamically infer and refine user profiles through dialogue, significantly improving personalized interaction quality and robustness.

Contribution

The paper proposes RLPA, a novel reinforcement learning approach for dynamic user profile modeling in LLMs, achieving state-of-the-art personalized dialogue performance.

Findings

01

Qwen-RLPA outperforms prompting and offline fine-tuning baselines.

02

Qwen-RLPA surpasses models like Claude-3.5 and GPT-4o in personalization.

03

The method enhances long-term personalization and robustness.

Abstract

Personalized alignment is essential for enabling large language models (LLMs) to engage effectively in user-centric dialogue. While recent prompt-based and offline optimization methods offer preliminary solutions, they fall short in cold-start scenarios and long-term personalization due to their inherently static and shallow designs. In this work, we introduce the Reinforcement Learning for Personalized Alignment (RLPA) framework, in which an LLM interacts with a simulated user model to iteratively infer and refine user profiles through dialogue. The training process is guided by a dual-level reward structure: the Profile Reward encourages accurate construction of user representations, while the Response Reward incentivizes generation of responses consistent with the inferred profile. We instantiate RLPA by fine-tuning Qwen-2.5-3B-Instruct, resulting in Qwen-RLPA, which achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems