Impatient Bandits: Optimizing Recommendations for the Long-Term Without   Delay

Thomas M. McDonald; Lucas Maystre; Mounia Lalmas; Daniel Russo; Kamil; Ciosek

arXiv:2307.09943·cs.LG·July 21, 2023

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil, Ciosek

PDF

1 Repo

TL;DR

This paper introduces a novel bandit algorithm that predicts delayed rewards to optimize long-term user satisfaction in recommender systems, demonstrated through a podcast recommendation case study.

Contribution

It develops a Bayesian filter-based predictive model for delayed rewards and a bandit algorithm that leverages this model to improve long-term recommendation quality.

Findings

01

Significantly outperforms short-term proxy optimization methods.

02

Effectively balances exploration and exploitation with delayed feedback.

03

Improves long-term engagement in podcast recommendations.

Abstract

Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spotify-research/impatient-bandits
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.