Improved Analysis of the Tsallis-INF Algorithm in Stochastically   Constrained Adversarial Bandits and Stochastic Bandits with Adversarial   Corruptions

Saeed Masoudian; Yevgeny Seldin

arXiv:2103.12487·cs.LG·September 14, 2021·1 cites

Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions

Saeed Masoudian, Yevgeny Seldin

PDF

Open Access

TL;DR

This paper improves regret bounds for the Tsallis-INF algorithm in various bandit settings, including adversarial, stochastic, and corrupted scenarios, by deriving tighter theoretical guarantees and extending analysis to broader contexts.

Contribution

The paper provides improved regret bounds for Tsallis-INF in adversarial and stochastic bandits with constraints and corruptions, and extends the analysis to more general settings beyond multi-armed bandits.

Findings

01

Achieves tighter regret bounds in adversarial and stochastic regimes.

02

Unified analysis covering stochastic, adversarial, and corrupted bandit settings.

03

Extends the theoretical framework to broader classes of bandit problems.

Abstract

We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a $(Δ, C, T)$ self-bounding constraint the algorithm achieves $O ((\sum_{i \neq = i^{*}} \frac{1}{Δ _{i}}) lo g_{+} (\frac{( K - 1 ) T}{( \sum _{i \neq = i^{*}} \frac{1}{Δ _{i}} ) ^{2}}) + C (\sum_{i \neq = i^{*}} \frac{1}{Δ _{i}}) lo g_{+} (\frac{( K - 1 ) T}{C \sum _{i \neq = i^{*}} \frac{1}{Δ _{i}}}))$ regret bound, where $T$ is the time horizon, $K$ is the number of arms, $Δ_{i}$ are the suboptimality gaps, $i^{*}$ is the best arm, $C$ is the corruption magnitude, and $lo g_{+} (x) = max (1, lo g x)$ . The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Forecasting Techniques and Applications · Advanced Bandit Algorithms Research