Unified Off-Policy Learning to Rank: a Reinforcement Learning   Perspective

Zeyu Zhang; Yi Su; Hui Yuan; Yiran Wu; Rishab Balasubramanian; Qingyun; Wu; Huazheng Wang; Mengdi Wang

arXiv:2306.07528·cs.LG·October 31, 2023·2 cites

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Zeyu Zhang, Yi Su, Hui Yuan, Yiran Wu, Rishab Balasubramanian, Qingyun, Wu, Huazheng Wang, Mengdi Wang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces CUOLR, a reinforcement learning approach that unifies off-policy learning to rank across various click models by modeling the ranking process as an MDP, enabling robust and model-agnostic learning.

Contribution

The paper proposes a novel MDP formulation for off-policy LTR that is agnostic to click models and applies offline RL techniques for improved robustness and versatility.

Findings

01

CUOLR outperforms existing algorithms on large-scale datasets.

02

It maintains robustness across different click models.

03

The method simplifies off-policy LTR without complex debiasing.

Abstract

Off-policy Learning to Rank (LTR) aims to optimize a ranker from data collected by a deployed logging policy. However, existing off-policy learning to rank methods often make strong assumptions about how users generate the click data, i.e., the click model, and hence need to tailor their methods specifically under different click models. In this paper, we unified the ranking process under general stochastic click models as a Markov Decision Process (MDP), and the optimal ranking could be learned with offline reinforcement learning (RL) directly. Building upon this, we leverage offline RL techniques for off-policy LTR and propose the Click Model-Agnostic Unified Off-policy Learning to Rank (CUOLR) method, which could be easily applied to a wide range of click models. Through a dedicated formulation of the MDP, we show that offline RL algorithms can adapt to various click models without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective· slideslive

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Economic and Environmental Valuation · Optimization and Search Problems