Survival Multiarmed Bandits with Bootstrapping Methods
Peter Veroutis, Fr\'ed\'eric Godin

TL;DR
This paper introduces a framework for Survival Multiarmed Bandits that balances reward maximization with risk of ruin using bootstrapped reward estimates, demonstrating superior performance over existing methods.
Contribution
It proposes a novel approach combining bootstrapping with a dual-objective framework to address survival constraints in multiarmed bandit problems.
Findings
Outperforms existing benchmarks in numerical experiments
Effectively balances reward and risk of ruin
Introduces a new bootstrapping-based action value estimation method
Abstract
The Multiarmed Bandits (MAB) problem has been extensively studied and has seen many practical applications in a variety of fields. The Survival Multiarmed Bandits (S-MAB) open problem is an extension which constrains an agent to a budget that is directly related to observed rewards. As budget depletion leads to ruin, an agent's objective is to both maximize expected cumulative rewards and minimize the probability of ruin. This paper presents a framework that addresses such a dual goal using an objective function balanced by a ruin aversion component. Action values are estimated through a novel approach which consists of bootstrapping samples from previously observed rewards. In numerical experiments, the policies we present outperform benchmarks from the literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Cognitive Radio Networks and Spectrum Sensing · Smart Grid Energy Management
