Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching
Quan Nguyen, Shinji Ito, Junpei Komiyama, Nishant A. Mehta

TL;DR
This paper introduces a novel stability-penalty matching method for multi-armed bandits that achieves optimal regret bounds in both stochastic and adversarial settings while adapting to data-specific properties.
Contribution
The paper develops real-time SPM, a new approach that provides data-dependent, best-of-both-worlds, and T-optimal regret bounds in multi-armed bandits, extending FTRL techniques.
Findings
Achieves $O(\sqrt{T})$ worst-case regret in adversarial regime.
Achieves $O(\ln T)$ regret in stochastic regime.
Adapts to data properties like sparsity and variations.
Abstract
Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either data-dependent but not best-of-both-worlds (BOBW), BOBW but not data-dependent or have sub-optimal worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method for obtaining regret bounds that are simultaneously data-dependent, best-of-both-worlds and -optimal for multi-armed bandits problems. In particular, we show that real-time SPM obtains bounds with worst-case guarantees of order in the adversarial regime and in the stochastic regime while simultaneously being adaptive to data-dependent quantities such as sparsity, variations, and small losses. Our results are obtained by extending the SPM technique for tuning the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Age of Information Optimization
