Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in   Multi-Armed Bandits using Stability-Penalty Matching

Quan Nguyen; Shinji Ito; Junpei Komiyama; Nishant A. Mehta

arXiv:2502.08143·cs.LG·February 13, 2025

Data-dependent Bounds with $T$-Optimal Best-of-Both-Worlds Guarantees in Multi-Armed Bandits using Stability-Penalty Matching

Quan Nguyen, Shinji Ito, Junpei Komiyama, Nishant A. Mehta

PDF

Open Access

TL;DR

This paper introduces a novel stability-penalty matching method for multi-armed bandits that achieves optimal regret bounds in both stochastic and adversarial settings while adapting to data-specific properties.

Contribution

The paper develops real-time SPM, a new approach that provides data-dependent, best-of-both-worlds, and T-optimal regret bounds in multi-armed bandits, extending FTRL techniques.

Findings

01

Achieves $O(\sqrt{T})$ worst-case regret in adversarial regime.

02

Achieves $O(\ln T)$ regret in stochastic regime.

03

Adapts to data properties like sparsity and variations.

Abstract

Existing data-dependent and best-of-both-worlds regret bounds for multi-armed bandits problems have limited adaptivity as they are either data-dependent but not best-of-both-worlds (BOBW), BOBW but not data-dependent or have sub-optimal $O (T ln T)$ worst-case guarantee in the adversarial regime. To overcome these limitations, we propose real-time stability-penalty matching (SPM), a new method for obtaining regret bounds that are simultaneously data-dependent, best-of-both-worlds and $T$ -optimal for multi-armed bandits problems. In particular, we show that real-time SPM obtains bounds with worst-case guarantees of order $O (T)$ in the adversarial regime and $O (ln T)$ in the stochastic regime while simultaneously being adaptive to data-dependent quantities such as sparsity, variations, and small losses. Our results are obtained by extending the SPM technique for tuning the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Age of Information Optimization