An Efficient Continuous Control Perspective for   Reinforcement-Learning-based Sequential Recommendation

Jun Wang; Likang Wu; Qi Liu; Yu Yang

arXiv:2408.08047·cs.LG·August 16, 2024

An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation

Jun Wang, Likang Wu, Qi Liu, Yu Yang

PDF

Open Access

TL;DR

This paper introduces ECoC, an efficient continuous control framework for reinforcement learning-based sequential recommendation, addressing the limitations of discrete action spaces and improving training efficiency and long-term user engagement.

Contribution

The paper proposes a novel unified action representation and a continuous control framework for RL-based recommendation, enabling more efficient training and better long-term performance.

Findings

01

ECoC trains more efficiently than discrete baselines.

02

ECoC outperforms baselines in offline data capture.

03

ECoC achieves higher long-term rewards.

Abstract

Sequential recommendation, where user preference is dynamically inferred from sequential historical behaviors, is a critical task in recommender systems (RSs). To further optimize long-term user engagement, offline reinforcement-learning-based RSs have become a mainstream technique as they provide an additional advantage in avoiding global explorations that may harm online users' experiences. However, previous studies mainly focus on discrete action and policy spaces, which might have difficulties in handling dramatically growing items efficiently. To mitigate this issue, in this paper, we aim to design an algorithmic framework applicable to continuous policies. To facilitate the control in the low-dimensional but dense user preference space, we propose an \underline{\textbf{E}}fficient \underline{\textbf{Co}}ntinuous \underline{\textbf{C}}ontrol framework (ECoC). Based on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research

MethodsFocus