Self-Distilled Reinforcement Learning for Co-Evolving Agentic Recommender Systems
Zongwei Wang, Min Gao, Hongzhi Yin, Junliang Yu, Tong Chen, Quoc Viet Hung Nguyen, Shazia Sadiq, and Tianrui Li

TL;DR
This paper introduces CoARS, a self-distilled reinforcement learning framework that enhances agentic recommender systems by internalizing multi-turn interaction experience into model parameters, improving recommendation quality and user alignment.
Contribution
The paper proposes a novel RL approach with interaction reward and self-distilled credit assignment for co-evolving agentic recommender systems, addressing limitations of existing methods.
Findings
CoARS outperforms baseline ARS models in recommendation accuracy.
The framework improves user alignment in multi-turn interactions.
Experimental results validate the effectiveness of the proposed methods.
Abstract
Large language model-empowered agentic recommender systems (ARS) reformulate recommendation as a multi-turn interaction between a recommender agent and a user agent, enabling iterative preference elicitation and refinement beyond conventional one-shot prediction. However, existing ARS are mainly optimized in a Reflexion-style paradigm, where past interaction trajectories are stored as textual memory and retrieved as prompt context for later reasoning. Although this design allows agents to recall prior feedback and observations, the accumulated experience remains external to model parameters, leaving agents reliant on generic reasoning rather than progressively acquiring recommendation-specific decision-making ability through learning. Reinforcement learning (RL) therefore provides a natural way to internalize such interaction experience into parameters. Yet existing RL methods for ARS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
