Generalizing distribution of partial rewards for multi-armed bandits   with temporally-partitioned rewards

Ronald C. van den Broek; Rik Litjens; Tobias Sagis; Luc Siecker; Nina; Verbeeke; Pratik Gajane

arXiv:2211.06883·cs.LG·November 15, 2022

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

Ronald C. van den Broek, Rik Litjens, Tobias Sagis, Luc Siecker, Nina, Verbeeke, Pratik Gajane

PDF

Open Access

TL;DR

This paper introduces a new framework for multi-armed bandits with temporally-partitioned rewards, generalizing reward distribution and proposing algorithms that improve regret bounds in this setting.

Contribution

It defines the Beta-spread property for reward distribution, derives a lower bound, and proposes the TP-UCB-FR-G algorithm to enhance regret performance in TP-MAB.

Findings

01

Beta-spread property generalizes reward distribution across rounds.

02

Lower bound established for TP-MAB with Beta-spread.

03

Proposed algorithm improves regret bounds in certain scenarios.

Abstract

We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward over multiple rounds rather than the entire reward for the arm all at once. In this paper, we introduce a general formulation of how an arm's cumulative reward is distributed across several rounds, called Beta-spread property. Such a generalization is needed to be able to handle partitioned rewards in which the maximum reward per round is not distributed uniformly across rounds. We derive a lower bound on the TP-MAB problem under the assumption that Beta-spread holds. Moreover, we provide an algorithm TP-UCB-FR-G, which uses the Beta-spread property to improve the regret upper bound in some scenarios. By generalizing how the cumulative reward is distributed, this setting is applicable in a broader range of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems