Online learning over a finite action set with limited switching
Jason Altschuler, Kunal Talwar

TL;DR
This paper advances the understanding of switching costs and budgets in online learning and multi-armed bandits, providing high probability guarantees and a complete characterization of the complexity for various switching constraints.
Contribution
It introduces the first high probability algorithms for switching costs, and fully characterizes the complexity of switching budgets in online learning and bandits.
Findings
First high probability algorithms achieving optimal regret and switch bounds.
Complete characterization of switching budget complexity for PFE and MAB.
Steady decay of minimax rate in bandits with limited switches.
Abstract
This paper studies the value of switching actions in the Prediction From Experts (PFE) problem and Adversarial Multi-Armed Bandits (MAB) problem. First, we revisit the well-studied and practically motivated setting of PFE with switching costs. Many algorithms are known to achieve the minimax optimal order of in expectation for both regret and number of switches, where is the number of iterations and the number of actions. However, no high probability (h.p.) guarantees are known. Our main technical contribution is the first algorithms which with h.p. achieve this optimal order for both regret and switches. This settles an open problem of [Devroye et al., 2015], and directly implies the first h.p. guarantees for several problems of interest. Next, to investigate the value of switching actions at a more granular level, we introduce the setting of switching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
