Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms
Derek Everett, Fred Lu, Edward Raff, Fernando Camacho, James Holt

TL;DR
This paper introduces a new bandit algorithm capable of efficiently optimizing in non-stationary environments with an extremely large or continuous set of actions, outperforming existing methods in speed and accuracy.
Contribution
The paper presents a novel Gaussian interpolation-based policy that learns continuous reward functions and extends to non-stationary environments, achieving low regret and high computational efficiency.
Findings
Achieves $ ilde{O}( oot{T}{} )$ regret in continuous Lipschitz bandits.
Extends to non-stationary environments with simple modifications.
Outperforms existing Gaussian process policies by 100-10000x in speed.
Abstract
Canonical algorithms for multi-armed bandits typically assume a stationary reward environment where the size of the action space (number of arms) is small. More recently developed methods typically relax only one of these assumptions: existing non-stationary bandit policies are designed for a small number of arms, while Lipschitz, linear, and Gaussian process bandit policies are designed to handle a large (or infinite) number of arms in stationary reward environments under constraints on the reward function. In this manuscript, we propose a novel policy to learn reward environments over a continuous space using Gaussian interpolation. We show that our method efficiently learns continuous Lipschitz reward functions with cumulative regret. Furthermore, our method naturally extends to non-stationary problems with a simple modification. We finally demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGaussian Process
