Impatient Bandits: Optimizing for the Long-Term Without Delay

Kelly W. Zhang; Thomas Baldwin-McDonald; Kamil Ciosek; Lucas Maystre,; Daniel Russo

arXiv:2501.07761·cs.LG·January 15, 2025

Impatient Bandits: Optimizing for the Long-Term Without Delay

Kelly W. Zhang, Thomas Baldwin-McDonald, Kamil Ciosek, Lucas Maystre,, Daniel Russo

PDF

Open Access

TL;DR

This paper introduces a bandit algorithm that effectively balances immediate proxy rewards and delayed long-term rewards, improving long-term user satisfaction in recommender systems through a predictive model and Bayesian filtering.

Contribution

It develops a novel predictive model for delayed rewards combined with a Bayesian filter and a bandit algorithm that optimizes long-term outcomes, validated on a large-scale podcast recommendation system.

Findings

01

Significantly outperforms short-term proxy optimization methods.

02

Proves a regret bound based on the Value of Progressive Feedback.

03

Demonstrates improved user engagement in a large-scale A/B test.

Abstract

Increasingly, recommender systems are tasked with improving users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a bandit problem with delayed rewards. There is an apparent trade-off in choosing the learning signal: waiting for the full reward to become available might take several weeks, slowing the rate of learning, whereas using short-term proxy rewards reflects the actual long-term goal only imperfectly. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Rewards as well as shorter-term surrogate outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that quickly learns to identify content aligned with long-term success using this new predictive model. We prove a regret bound for our algorithm that depends on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFamily and Patient Care in Intensive Care Units · Hospital Admissions and Outcomes · Healthcare Operations and Scheduling Optimization