Distributional Off-Policy Evaluation for Slate Recommendations
Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis

TL;DR
This paper introduces a new estimator for the complete off-policy performance distribution in slate recommendation systems, enabling more comprehensive evaluation along risk and fairness axes.
Contribution
It develops an unbiased, consistent estimator for the full performance distribution of slate recommendation strategies, extending prior off-policy evaluation methods.
Findings
Significant variance reduction in estimates
Improved sample efficiency over previous methods
Validated on synthetic and real-world data
Abstract
Recommendation strategies are typically evaluated by using previously logged data, employing off-policy evaluation methods to estimate their expected performance. However, for strategies that present users with slates of multiple items, the resulting combinatorial action space renders many of these methods impractical. Prior work has developed estimators that leverage the structure in slates to estimate the expected off-policy performance, but the estimation of the entire performance distribution remains elusive. Estimating the complete distribution allows for a more comprehensive evaluation of recommendation strategies, particularly along the axes of risk and fairness that employ metrics computable from the distribution. In this paper, we propose an estimator for the complete off-policy performance distribution for slates and establish conditions under which the estimator is unbiased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Reinforcement Learning in Robotics
