Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui, Wu, Heng-Tze Cheng, Morgane Lustman, Vince Gatto, Paul Covington, Jim, McFadden, Tushar Chandra, Craig Boutilier

TL;DR
This paper introduces SLATEQ, a novel RL-based approach for slate recommendation systems that decomposes long-term value to enable scalable, long-term optimized recommendations, validated through simulations and live YouTube experiments.
Contribution
The paper presents a new decomposition method for RL in slate recommendations, enabling tractable long-term value optimization and practical implementation.
Findings
SLATEQ effectively decomposes slate value, making RL scalable.
The methodology leverages existing recommenders for long-term optimization.
Live experiments on YouTube validate the approach's scalability and effectiveness.
Abstract
Most practical recommender systems focus on estimating immediate user engagement without considering the long-term effects of recommendations on user behavior. Reinforcement learning (RL) methods offer the potential to optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items - which may have interacting effects on user choice - methods are required to deal with the combinatorics of the RL action space. In this work, we address the challenge of making slate-based recommendations to optimize long-term value using RL. Our contributions are three-fold. (i) We develop SLATEQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Bandit Algorithms Research · Smart Grid Energy Management
MethodsQ-Learning
