RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1

Yu Xie; Xingkai Ren; Ying Qi; Yao Hu; Lianlei Shan

arXiv:2506.19235·cs.AI·June 25, 2025

RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1

Yu Xie, Xingkai Ren, Ying Qi, Yao Hu, Lianlei Shan

PDF

Open Access

TL;DR

RecLLM-R1 introduces a two-stage training framework for recommendation systems that combines supervised fine-tuning and reinforcement learning with chain-of-thought reasoning, improving accuracy, diversity, and business alignment.

Contribution

The paper presents a novel two-stage training paradigm for LLM-based recommendation systems, integrating reinforcement learning with chain-of-thought to optimize multiple objectives.

Findings

01

Outperforms baseline methods on real-world social media data

02

Enhances recommendation diversity and novelty

03

Mitigates filter bubble effects

Abstract

Traditional recommendation systems often grapple with "filter bubbles", underutilization of external knowledge, and a disconnect between model optimization and business policy iteration. To address these limitations, this paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs) and drawing inspiration from the DeepSeek R1 methodology. The framework initiates by transforming user profiles, historical interactions, and multi-faceted item attributes into LLM-interpretable natural language prompts through a carefully engineered data construction process. Subsequently, a two-stage training paradigm is employed: the initial stage involves Supervised Fine-Tuning (SFT) to imbue the LLM with fundamental recommendation capabilities. The subsequent stage utilizes Group Relative Policy Optimization (GRPO), a reinforcement learning technique, augmented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Decision Making