Improved Analysis of the Tsallis-INF Algorithm in Stochastically Constrained Adversarial Bandits and Stochastic Bandits with Adversarial Corruptions
Saeed Masoudian, Yevgeny Seldin

TL;DR
This paper improves regret bounds for the Tsallis-INF algorithm in various bandit settings, including adversarial, stochastic, and corrupted scenarios, by deriving tighter theoretical guarantees and extending analysis to broader contexts.
Contribution
The paper provides improved regret bounds for Tsallis-INF in adversarial and stochastic bandits with constraints and corruptions, and extends the analysis to more general settings beyond multi-armed bandits.
Findings
Achieves tighter regret bounds in adversarial and stochastic regimes.
Unified analysis covering stochastic, adversarial, and corrupted bandit settings.
Extends the theoretical framework to broader classes of bandit problems.
Abstract
We derive improved regret bounds for the Tsallis-INF algorithm of Zimmert and Seldin (2021). We show that in adversarial regimes with a self-bounding constraint the algorithm achieves regret bound, where is the time horizon, is the number of arms, are the suboptimality gaps, is the best arm, is the corruption magnitude, and . The regime includes stochastic bandits, stochastically constrained adversarial bandits, and stochastic bandits with adversarial corruptions as special cases. Additionally, we provide a general analysis,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Forecasting Techniques and Applications · Advanced Bandit Algorithms Research
