Towards Sample-Efficient and Stable Reinforcement Learning for LLM-based Recommendation

Hongxun Ding; Keqin Bao; Jizhi Zhang; Yi Fang; Wenxin Xu; Fuli Feng; Xiangnan He

arXiv:2602.00632·cs.IR·February 3, 2026

Towards Sample-Efficient and Stable Reinforcement Learning for LLM-based Recommendation

Hongxun Ding, Keqin Bao, Jizhi Zhang, Yi Fang, Wenxin Xu, Fuli Feng, Xiangnan He

PDF

Open Access

TL;DR

This paper introduces RISER, a reinforcement learning framework designed to improve sample efficiency and stability in LLM-based recommendation systems, addressing limitations of Chain-of-Thought reasoning.

Contribution

The paper proposes RISER, a novel RL-based method that transforms trajectories into preference data and incorporates stability strategies, advancing RL application in recommendation systems.

Findings

01

RISER significantly outperforms baselines on real-world datasets.

02

It enhances sample efficiency in RL for recommendations.

03

RISER ensures training stability through specific strategies.

Abstract

While Long Chain-of-Thought (Long CoT) reasoning has shown promise in Large Language Models (LLMs), its adoption for enhancing recommendation quality is growing rapidly. In this work, we critically examine this trend and argue that Long CoT is inherently ill-suited for the sequential recommendation domain. We attribute this misalignment to two primary factors: excessive inference latency and the lack of explicit cognitive reasoning patterns in user behavioral data. Driven by these observations, we propose pivoting away from the CoT structure to directly leverage its underlying mechanism: Reinforcement Learning (RL), to explore the item space. However, applying RL directly faces significant obstacles, notably low sample efficiency-where most actions fail to provide learning signals-and training instability. To overcome these limitations, we propose RISER, a novel Reinforced Item Space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)