Long-term Off-Policy Evaluation and Learning
Yuta Saito, Himan Abdollahpouri, Jesse Anderton, Ben Carterette,, Mounia Lalmas

TL;DR
This paper introduces LOPE, a new framework for estimating long-term algorithm outcomes using short-term data, improving accuracy and efficiency over existing methods, especially when surrogacy assumptions are violated.
Contribution
LOPE is a novel reward decomposition framework that relaxes surrogacy assumptions and better utilizes short-term rewards for long-term outcome estimation.
Findings
LOPE outperforms existing methods in synthetic experiments with noisy long-term rewards.
LOPE provides more accurate long-term outcome estimates on real-world music streaming data.
LOPE is effective even when surrogacy assumptions are severely violated.
Abstract
Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the long-term outcome is to run an online experiment or A/B test for the potential algorithms, but it takes months or even longer to observe the long-term outcomes of interest, making the algorithm selection process unacceptably slow. This work thus studies the problem of feasibly yet accurately estimating the long-term outcome of an algorithm using only historical and short-term experiment data. Existing approaches to this problem either need a restrictive assumption about the short-term outcomes called surrogacy or cannot effectively use short-term outcomes, which is inefficient. Therefore, we propose a new framework called Long-term Off-Policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvaluation and Performance Assessment
