Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When   Partial Feedback Counts

Giulia Romano; Andrea Agostini; Francesco Trov\`o; Nicola Gatti,; Marcello Restelli

arXiv:2206.00586·cs.LG·June 2, 2022

Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

Giulia Romano, Andrea Agostini, Francesco Trov\`o, Nicola Gatti,, Marcello Restelli

PDF

Open Access

TL;DR

This paper introduces a new multi-armed bandit setting where rewards are partitioned over time after an arm pull, proposing algorithms that leverage this partial feedback to improve regret bounds and performance.

Contribution

The paper defines the novel TP-MAB setting, develops two algorithms for it, and demonstrates their superior theoretical and empirical performance under certain reward structures.

Findings

01

Algorithms outperform delayed-feedback bandits under alpha-smoothness.

02

Proposed methods achieve better asymptotic regret bounds.

03

Empirical results show effectiveness in real-world media recommendation.

Abstract

There is a rising interest in industrial online applications where data becomes available sequentially. Inspired by the recommendation of playlists to users where their preferences can be collected during the listening of the entire playlist, we study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB), in which the stochastic reward associated with the pull of an arm is partitioned over a finite number of consecutive rounds following the pull. This setting, unexplored so far to the best of our knowledge, is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull instead of being fully disclosed in a single, potentially delayed round. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW, which exploit the partial information disclosed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Optimization and Search Problems