Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives
Romain Deffayet, Thibaut Thonet, Jean-Michel Renders, Maarten de Rijke

TL;DR
This paper critiques current offline evaluation methods for reinforcement learning-based recommender systems, highlighting their shortcomings and proposing alternative evaluation approaches to better reflect RL benefits.
Contribution
It identifies the limitations of next-item prediction protocols in offline RL recommendation evaluation and suggests new methods to improve assessment reliability.
Findings
Current evaluation protocols hide RL deficiencies
Next-item prediction does not reflect RL benefits
Proposed alternatives aim for more reliable evaluation
Abstract
In this paper, we argue that the paradigm commonly adopted for offline evaluation of sequential recommender systems is unsuitable for evaluating reinforcement learning-based recommenders. We find that most of the existing offline evaluation practices for reinforcement learning-based recommendation are based on a next-item prediction protocol, and detail three shortcomings of such an evaluation protocol. Notably, it cannot reflect the potential benefits that reinforcement learning (RL) is expected to bring while it hides critical deficiencies of certain offline RL agents. Our suggestions for alternative ways to evaluate RL-based recommender systems aim to shed light on the existing possibilities and inspire future research on reliable evaluation protocols.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management
