Small steps no more: Global convergence of stochastic gradient bandits   for arbitrary learning rates

Jincheng Mei; Bo Dai; Alekh Agarwal; Sharan Vaswani; Anant; Raj; Csaba Szepesvari; Dale Schuurmans

arXiv:2502.07141·cs.LG·February 12, 2025

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Jincheng Mei, Bo Dai, Alekh Agarwal, Sharan Vaswani, Anant, Raj, Csaba Szepesvari, Dale Schuurmans

PDF

Open Access 1 Video

TL;DR

This paper proves that stochastic gradient bandit algorithms with any constant learning rate almost surely converge to a globally optimal policy, even without standard smoothness or noise assumptions.

Contribution

It introduces a new theoretical understanding showing convergence of stochastic gradient bandits with arbitrary constant learning rates.

Findings

01

Convergence to global optimum with any constant learning rate

02

Balances exploration and exploitation without standard assumptions

03

Extends understanding of stochastic gradient methods in bandit settings

Abstract

We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using \emph{any} constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research