Survival Multiarmed Bandits with Bootstrapping Methods

Peter Veroutis; Fr\'ed\'eric Godin

arXiv:2410.16486·cs.LG·November 6, 2024

Survival Multiarmed Bandits with Bootstrapping Methods

Peter Veroutis, Fr\'ed\'eric Godin

PDF

Open Access

TL;DR

This paper introduces a framework for Survival Multiarmed Bandits that balances reward maximization with risk of ruin using bootstrapped reward estimates, demonstrating superior performance over existing methods.

Contribution

It proposes a novel approach combining bootstrapping with a dual-objective framework to address survival constraints in multiarmed bandit problems.

Findings

01

Outperforms existing benchmarks in numerical experiments

02

Effectively balances reward and risk of ruin

03

Introduces a new bootstrapping-based action value estimation method

Abstract

The Multiarmed Bandits (MAB) problem has been extensively studied and has seen many practical applications in a variety of fields. The Survival Multiarmed Bandits (S-MAB) open problem is an extension which constrains an agent to a budget that is directly related to observed rewards. As budget depletion leads to ruin, an agent's objective is to both maximize expected cumulative rewards and minimize the probability of ruin. This paper presents a framework that addresses such a dual goal using an objective function balanced by a ruin aversion component. Action values are estimated through a novel approach which consists of bootstrapping samples from previously observed rewards. In numerical experiments, the policies we present outperform benchmarks from the literature.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Smart Grid Energy Management