Last-Iterate Analyses of FTRL with the 1/2-Tsallis Entropy in Stochastic Bandits
Jingxin Zhan, Yuze Han, Zhihua Zhang

TL;DR
This paper analyzes the last-iterate convergence of the FTRL algorithm with 1/2-Tsallis entropy in stochastic bandits, showing a decay rate of t^{-1/2} in Bregman divergence, linking regret and convergence.
Contribution
It provides the first theoretical analysis of the last-iterate convergence rate for FTRL with 1/2-Tsallis entropy in stochastic bandits.
Findings
Bregman divergence decays at a rate of t^{-1/2}.
Logarithmic regret implies a t^{-1} last-iterate convergence rate.
Partially confirms the intuition connecting regret and convergence rate.
Abstract
The convergence analysis of online learning algorithms is central to machine learning theory, where the last-iterate convergence is particularly important, as it captures the learner's actual decisions and describes the evolution of the learning process over time. However, in multi-armed bandits, most existing algorithmic analyses mainly focus on the order of regret, while the last-iterate (simple regret) convergence rate remains less explored -- especially for the widely studied Follow-the-Regularized-Leader (FTRL) algorithms. Recently, FTRL with the -Tsallis entropy regularizer (the -Tsallis-INF algorithm, by arXiv:1807.07623) was shown to achieve logarithmic regret in stochastic bandits. Nevertheless, its last-iterate convergence rate has not yet been studied. Intuitively, logarithmic regret should correspond to a last-iterate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
