An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation
Jun Wang, Likang Wu, Qi Liu, Yu Yang

TL;DR
This paper introduces ECoC, an efficient continuous control framework for reinforcement learning-based sequential recommendation, addressing the limitations of discrete action spaces and improving training efficiency and long-term user engagement.
Contribution
The paper proposes a novel unified action representation and a continuous control framework for RL-based recommendation, enabling more efficient training and better long-term performance.
Findings
ECoC trains more efficiently than discrete baselines.
ECoC outperforms baselines in offline data capture.
ECoC achieves higher long-term rewards.
Abstract
Sequential recommendation, where user preference is dynamically inferred from sequential historical behaviors, is a critical task in recommender systems (RSs). To further optimize long-term user engagement, offline reinforcement-learning-based RSs have become a mainstream technique as they provide an additional advantage in avoiding global explorations that may harm online users' experiences. However, previous studies mainly focus on discrete action and policy spaces, which might have difficulties in handling dramatically growing items efficiently. To mitigate this issue, in this paper, we aim to design an algorithmic framework applicable to continuous policies. To facilitate the control in the low-dimensional but dense user preference space, we propose an \underline{\textbf{E}}fficient \underline{\textbf{Co}}ntinuous \underline{\textbf{C}}ontrol framework (ECoC). Based on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research
MethodsFocus
