Reinforced Preference Optimization for Reasoning-Augmented Recommendations

Jingtong Gao; Zeyu Song; Chi Lu; Xiaopeng Li; Derong Xu; Maolin Wang; Peng Jiang; Kun Gai; Qingpeng Cai; Xiangyu Zhao

arXiv:2605.21967·cs.IR·May 22, 2026

Reinforced Preference Optimization for Reasoning-Augmented Recommendations

Jingtong Gao, Zeyu Song, Chi Lu, Xiaopeng Li, Derong Xu, Maolin Wang, Peng Jiang, Kun Gai, Qingpeng Cai, Xiangyu Zhao

PDF

TL;DR

This paper introduces RPORec, a reinforcement learning framework that combines reasoning-augmented language models with a dedicated recommendation head to improve personalized recommendations.

Contribution

It proposes a unified approach that integrates reasoning capabilities with recommendation objectives, enhancing accuracy and interpretability in recommender systems.

Findings

01

RPORec outperforms existing LLM-based recommendation methods on benchmarks.

02

The framework improves reasoning quality and structural consistency.

03

Online deployment shows significant performance gains.

Abstract

Recommender systems are critical for delivering personalized content across digital platforms, and recent advances in Large Language Models (LLMs) offer new opportunities to enhance them with richer world knowledge and explicit reasoning capabilities. With the help of reasoning knowledge, recommendations can better infer users' underlying intents, adapt to evolving preferences, and leverage semantic relationships for improved accuracy and interpretability. However, existing reasoning-based recommendation methods often fail to fully align the LLM's reasoning process with recommendation-specific objectives due to structural disruption during integration and difficulties in translating free-form generation into accurate item predictions. In this paper, we introduce RPORec, a reinforced preference optimization framework that unifies an LLM backbone's reasoning ability with a dedicated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.