Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling
Tatsuki Takahashi, Chihiro Maru, Hiroko Shoji

TL;DR
This paper introduces a new off-policy evaluation method for ranking policies that reduces variance and improves accuracy by modeling user behavior in embedding spaces, especially useful for large action spaces.
Contribution
It proposes the GMIPS estimator with favorable statistical properties and introduces MRIPS, a variant that balances bias and variance in large ranking action spaces.
Findings
GMIPS achieves the lowest mean squared error among estimators.
MRIPS balances bias and variance effectively in large action spaces.
Experimental results demonstrate improved off-policy evaluation accuracy.
Abstract
Off-policy evaluation (OPE) in ranking settings with large ranking action spaces, which stems from an increase in both the number of unique actions and length of the ranking, is essential for assessing new recommender policies using only logged bandit data from previous versions. To address the high variance issues associated with existing estimators, we introduce two new assumptions: no direct effect on rankings and user behavior model on ranking embedding spaces. We then propose the generalized marginalized inverse propensity score (GMIPS) estimator with statistically desirable properties compared to existing ones. Finally, we demonstrate that the GMIPS achieves the lowest MSE. Notably, among GMIPS variants, the marginalized reward interaction IPS (MRIPS) incorporates a doubly marginalized importance weight based on a cascade behavior assumption on ranking embeddings. MRIPS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOpinion Dynamics and Social Influence · Economic and Environmental Valuation · Human Mobility and Location-Based Analysis
