Generative Slate Recommendation with Reinforcement Learning
Romain Deffayet, Thibaut Thonet, Jean-Michel Renders, Maarten de Rijke

TL;DR
This paper introduces a novel reinforcement learning approach for slate recommendation that uses a variational auto-encoder to encode item lists into a continuous space, enabling better modeling of diverse, long-term user engagement.
Contribution
It proposes a continuous latent space encoding for slates in RL, relaxing previous restrictive assumptions and improving diversity and quality in recommendations.
Findings
Effective in simulated environments with relaxed assumptions
Outperforms baselines in diversity and long-term engagement
Shows promise for generalizable RL-based slate recommendation
Abstract
Recent research has employed reinforcement learning (RL) algorithms to optimize long-term user engagement in recommender systems, thereby avoiding common pitfalls such as user boredom and filter bubbles. They capture the sequential and interactive nature of recommendations, and thus offer a principled way to deal with long-term rewards and avoid myopic behaviors. However, RL approaches are intractable in the slate recommendation scenario - where a list of items is recommended at each interaction turn - due to the combinatorial action space. In that setting, an action corresponds to a slate that may contain any combination of items. While previous work has proposed well-chosen decompositions of actions so as to ensure tractability, these rely on restrictive and sometimes unrealistic assumptions. Instead, in this work we propose to encode slates in a continuous, low-dimensional latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
