A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates
Wei-Cheng Lee, Francesco Orabona

TL;DR
This paper provides a simplified proof for the Tsallis-INF algorithm's best-of-both-worlds guarantees in multi-armed bandits, avoiding conjugate functions and focusing on clarity over constant optimization.
Contribution
It offers a new, streamlined derivation of the guarantees for Tsallis-INF, utilizing modern convex optimization tools and simplifying the proof structure.
Findings
Simplified proof of Tsallis-INF guarantees
Avoids use of conjugate functions in proof
Focuses on clarity over constant optimization
Abstract
In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not optimize the constants in the bounds in favor of a slimmer proof.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques
