Pessimistic Off-Policy Optimization for Learning to Rank

Matej Cief; Branislav Kveton; Michal Kompan

arXiv:2206.02593·cs.LG·October 23, 2024·1 cites

Pessimistic Off-Policy Optimization for Learning to Rank

Matej Cief, Branislav Kveton, Michal Kompan

PDF

Open Access

TL;DR

This paper introduces a pessimistic off-policy optimization method for learning to rank in recommender systems, addressing data imbalance and combinatorial action spaces by using confidence bounds to improve policy learning.

Contribution

It proposes a novel, computationally efficient pessimistic approach with Bayesian and frequentist variants, incorporating empirical Bayes to handle unknown priors, outperforming existing methods.

Findings

01

Outperforms inverse propensity score methods in experiments

02

Robust and general approach demonstrated empirically

03

Effective in addressing data imbalance in ranking tasks

Abstract

Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Recommender Systems and Techniques · Optimization and Search Problems