Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Tatsuki Takahashi; Chihiro Maru; Hiroko Shoji

arXiv:2506.00446·stat.ML·June 3, 2025

Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Tatsuki Takahashi, Chihiro Maru, Hiroko Shoji

PDF

Open Access

TL;DR

This paper introduces a new off-policy evaluation method for ranking policies that reduces variance and improves accuracy by modeling user behavior in embedding spaces, especially useful for large action spaces.

Contribution

It proposes the GMIPS estimator with favorable statistical properties and introduces MRIPS, a variant that balances bias and variance in large ranking action spaces.

Findings

01

GMIPS achieves the lowest mean squared error among estimators.

02

MRIPS balances bias and variance effectively in large action spaces.

03

Experimental results demonstrate improved off-policy evaluation accuracy.

Abstract

Off-policy evaluation (OPE) in ranking settings with large ranking action spaces, which stems from an increase in both the number of unique actions and length of the ranking, is essential for assessing new recommender policies using only logged bandit data from previous versions. To address the high variance issues associated with existing estimators, we introduce two new assumptions: no direct effect on rankings and user behavior model on ranking embedding spaces. We then propose the generalized marginalized inverse propensity score (GMIPS) estimator with statistically desirable properties compared to existing ones. Finally, we demonstrate that the GMIPS achieves the lowest MSE. Notably, among GMIPS variants, the marginalized reward interaction IPS (MRIPS) incorporates a doubly marginalized importance weight based on a cascade behavior assumption on ranking embeddings. MRIPS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOpinion Dynamics and Social Influence · Economic and Environmental Valuation · Human Mobility and Location-Based Analysis