An Algorithm for Stochastic and Adversarial Bandits with Switching Costs
Chlo\'e Rouyer, Yevgeny Seldin, Nicol\`o Cesa-Bianchi

TL;DR
This paper introduces a versatile algorithm for multiarmed bandits with switching costs, achieving optimal regret bounds in both stochastic and adversarial settings without prior knowledge of the environment.
Contribution
It adapts the Tsallis-INF algorithm to handle switching costs, providing minimax optimal regret bounds across different regimes and extending to time-varying switching costs.
Findings
Achieves minimax optimal regret bounds in adversarial and stochastic regimes.
Performs competitively with baseline algorithms in various settings.
Handles environments with changing switching costs over time.
Abstract
We propose an algorithm for stochastic and adversarial multiarmed bandits with switching costs, where the algorithm pays a price every time it switches the arm being played. Our algorithm is based on adaptation of the Tsallis-INF algorithm of Zimmert and Seldin (2021) and requires no prior knowledge of the regime or time horizon. In the oblivious adversarial setting it achieves the minimax optimal regret bound of , where is the time horizon and is the number of arms. In the stochastically constrained adversarial regime, which includes the stochastic regime as a special case, it achieves a regret bound of , where are the suboptimality gaps and is a unique optimal arm. In the special case of (no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference
