Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective
Xin Xin, Tiago Pimentel, Alexandros Karatzoglou, Pengjie Ren,, Konstantina Christakopoulou, Zhaochun Ren

TL;DR
This paper introduces Prompt-Based Reinforcement Learning (PRL), a novel offline training paradigm for recommendation systems that predicts items based on historical data and reward prompts, avoiding online exploration errors.
Contribution
It proposes a new offline RL training approach for recommendation agents that directly infers actions from state-reward prompts, simplifying training with supervised learning.
Findings
PRL outperforms traditional RL methods on real-world datasets.
The approach simplifies offline training by using supervised learning.
Experimental results demonstrate superior recommendation accuracy.
Abstract
Modern recommender systems aim to improve user experience. As reinforcement learning (RL) naturally fits this objective -- maximizing an user's reward per session -- it has become an emerging topic in recommender systems. Developing RL-based recommendation methods, however, is not trivial due to the \emph{offline training challenge}. Specifically, the keystone of traditional RL is to train an agent with large amounts of online exploration making lots of `errors' in the process. In the recommendation setting, though, we cannot afford the price of making `errors' online. As a result, the agent needs to be trained through offline historical implicit feedback, collected under different recommendation policies; traditional RL algorithms may lead to sub-optimal policies under these offline training settings. Here we propose a new learning paradigm -- namely Prompt-Based Reinforcement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Smart Grid Energy Management
