Towards Sample-Efficient and Stable Reinforcement Learning for LLM-based Recommendation
Hongxun Ding, Keqin Bao, Jizhi Zhang, Yi Fang, Wenxin Xu, Fuli Feng, Xiangnan He

TL;DR
This paper introduces RISER, a reinforcement learning framework designed to improve sample efficiency and stability in LLM-based recommendation systems, addressing limitations of Chain-of-Thought reasoning.
Contribution
The paper proposes RISER, a novel RL-based method that transforms trajectories into preference data and incorporates stability strategies, advancing RL application in recommendation systems.
Findings
RISER significantly outperforms baselines on real-world datasets.
It enhances sample efficiency in RL for recommendations.
RISER ensures training stability through specific strategies.
Abstract
While Long Chain-of-Thought (Long CoT) reasoning has shown promise in Large Language Models (LLMs), its adoption for enhancing recommendation quality is growing rapidly. In this work, we critically examine this trend and argue that Long CoT is inherently ill-suited for the sequential recommendation domain. We attribute this misalignment to two primary factors: excessive inference latency and the lack of explicit cognitive reasoning patterns in user behavioral data. Driven by these observations, we propose pivoting away from the CoT structure to directly leverage its underlying mechanism: Reinforcement Learning (RL), to explore the item space. However, applying RL directly faces significant obstacles, notably low sample efficiency-where most actions fail to provide learning signals-and training instability. To overcome these limitations, we propose RISER, a novel Reinforced Item Space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)
