Long-term Off-Policy Evaluation and Learning

Yuta Saito; Himan Abdollahpouri; Jesse Anderton; Ben Carterette,; Mounia Lalmas

arXiv:2404.15691·cs.LG·April 25, 2024

Long-term Off-Policy Evaluation and Learning

Yuta Saito, Himan Abdollahpouri, Jesse Anderton, Ben Carterette,, Mounia Lalmas

PDF

Open Access 1 Repo

TL;DR

This paper introduces LOPE, a new framework for estimating long-term algorithm outcomes using short-term data, improving accuracy and efficiency over existing methods, especially when surrogacy assumptions are violated.

Contribution

LOPE is a novel reward decomposition framework that relaxes surrogacy assumptions and better utilizes short-term rewards for long-term outcome estimation.

Findings

01

LOPE outperforms existing methods in synthetic experiments with noisy long-term rewards.

02

LOPE provides more accurate long-term outcome estimates on real-world music streaming data.

03

LOPE is effective even when surrogacy assumptions are severely violated.

Abstract

Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the long-term outcome is to run an online experiment or A/B test for the potential algorithms, but it takes months or even longer to observe the long-term outcomes of interest, making the algorithm selection process unacceptably slow. This work thus studies the problem of feasibly yet accurately estimating the long-term outcome of an algorithm using only historical and short-term experiment data. Existing approaches to this problem either need a restrictive assumption about the short-term outcomes called surrogacy or cannot effectively use short-term outcomes, which is inefficient. Therefore, we propose a new framework called Long-term Off-Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

usaito/www2024-lope
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvaluation and Performance Assessment