Supervised Advantage Actor-Critic for Recommender Systems
Xin Xin, Alexandros Karatzoglou, Ioannis Arapakis, Joemon M. Jose

TL;DR
This paper introduces a novel RL-based framework for sequential recommendation that combines supervised learning with advantage actor-critic methods, addressing key challenges like biased Q-value estimation and large action spaces.
Contribution
It proposes the Supervised Advantage Actor-Critic (SA2C) framework with a negative sampling strategy, improving recommendation performance over existing methods.
Findings
Significantly outperforms state-of-the-art supervised methods.
Achieves better results than existing self-supervised RL approaches.
Demonstrates effectiveness on real-world datasets.
Abstract
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Machine Learning in Healthcare
MethodsQ-Learning
