Conservative Bandits

Yifan Wu; Roshan Shariff; Tor Lattimore; Csaba Szepesv\'ari

arXiv:1602.04282·stat.ML·February 16, 2016·31 cites

Conservative Bandits

Yifan Wu, Roshan Shariff, Tor Lattimore, Csaba Szepesv\'ari

PDF

Open Access

TL;DR

This paper introduces a new constrained multi-armed bandit problem where a company aims to maximize revenue while ensuring it stays above a baseline, analyzing both stochastic and adversarial cases with novel strategies and bounds.

Contribution

It proposes new algorithms for constrained bandits in stochastic and adversarial settings, providing theoretical bounds and empirical validation.

Findings

01

High probability and expectation regret bounds established.

02

Adversarial setting incurs higher cost for constraint maintenance.

03

Almost optimal algorithm for the stochastic setting.

Abstract

We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies to maximize revenue whilst simultaneously maintaining their revenue above a fixed baseline, uniformly over time. While previous work addressed the problem under the weaker requirement of maintaining the revenue constraint only at a given fixed time in the future, the algorithms previously proposed are unsuitable due to their design under the more stringent constraints. We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints. Amongst other things, we prove both high probability and expectation bounds on the regret, while we also consider both the problem of maintaining the constraints with high probability or expectation. For the adversarial setting the price…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Reinforcement Learning in Robotics