Distributional Off-Policy Evaluation for Slate Recommendations

Shreyas Chaudhari; David Arbour; Georgios Theocharous; Nikos Vlassis

arXiv:2308.14165·cs.IR·December 29, 2023

Distributional Off-Policy Evaluation for Slate Recommendations

Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new estimator for the complete off-policy performance distribution in slate recommendation systems, enabling more comprehensive evaluation along risk and fairness axes.

Contribution

It develops an unbiased, consistent estimator for the full performance distribution of slate recommendation strategies, extending prior off-policy evaluation methods.

Findings

01

Significant variance reduction in estimates

02

Improved sample efficiency over previous methods

03

Validated on synthetic and real-world data

Abstract

Recommendation strategies are typically evaluated by using previously logged data, employing off-policy evaluation methods to estimate their expected performance. However, for strategies that present users with slates of multiple items, the resulting combinatorial action space renders many of these methods impractical. Prior work has developed estimators that leverage the structure in slates to estimate the expected off-policy performance, but the estimation of the entire performance distribution remains elusive. Estimating the complete distribution allows for a more comprehensive evaluation of recommendation strategies, particularly along the axes of risk and fairness that employ metrics computable from the distribution. In this paper, we propose an estimator for the complete off-policy performance distribution for slates and establish conditions under which the estimator is unbiased…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shreyasc-13/suno
noneOfficial

Videos

Distributional Off-Policy Evaluation for Slate Recommendations· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Reinforcement Learning in Robotics