Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms

Derek Everett; Fred Lu; Edward Raff; Fernando Camacho; James Holt

arXiv:2505.24692·cs.LG·June 2, 2025

Quick-Draw Bandits: Quickly Optimizing in Nonstationary Environments with Extremely Many Arms

Derek Everett, Fred Lu, Edward Raff, Fernando Camacho, James Holt

PDF

TL;DR

This paper introduces a new bandit algorithm capable of efficiently optimizing in non-stationary environments with an extremely large or continuous set of actions, outperforming existing methods in speed and accuracy.

Contribution

The paper presents a novel Gaussian interpolation-based policy that learns continuous reward functions and extends to non-stationary environments, achieving low regret and high computational efficiency.

Findings

01

Achieves $ ilde{O}( oot{T}{} )$ regret in continuous Lipschitz bandits.

02

Extends to non-stationary environments with simple modifications.

03

Outperforms existing Gaussian process policies by 100-10000x in speed.

Abstract

Canonical algorithms for multi-armed bandits typically assume a stationary reward environment where the size of the action space (number of arms) is small. More recently developed methods typically relax only one of these assumptions: existing non-stationary bandit policies are designed for a small number of arms, while Lipschitz, linear, and Gaussian process bandit policies are designed to handle a large (or infinite) number of arms in stationary reward environments under constraints on the reward function. In this manuscript, we propose a novel policy to learn reward environments over a continuous space using Gaussian interpolation. We show that our method efficiently learns continuous Lipschitz reward functions with $O^{*} (T)$ cumulative regret. Furthermore, our method naturally extends to non-stationary problems with a simple modification. We finally demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGaussian Process