Conservative Bandits
Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesv\'ari

TL;DR
This paper introduces a new constrained multi-armed bandit problem where a company aims to maximize revenue while ensuring it stays above a baseline, analyzing both stochastic and adversarial cases with novel strategies and bounds.
Contribution
It proposes new algorithms for constrained bandits in stochastic and adversarial settings, providing theoretical bounds and empirical validation.
Findings
High probability and expectation regret bounds established.
Adversarial setting incurs higher cost for constraint maintenance.
Almost optimal algorithm for the stochastic setting.
Abstract
We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies to maximize revenue whilst simultaneously maintaining their revenue above a fixed baseline, uniformly over time. While previous work addressed the problem under the weaker requirement of maintaining the revenue constraint only at a given fixed time in the future, the algorithms previously proposed are unsuitable due to their design under the more stringent constraints. We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints. Amongst other things, we prove both high probability and expectation bounds on the regret, while we also consider both the problem of maintaining the constraints with high probability or expectation. For the adversarial setting the price…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Reinforcement Learning in Robotics
