Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L ojasiewicz Functions
Tianyu Wang, Yasong Feng

TL;DR
This paper establishes convergence rates for Stochastic Zeroth-order Gradient Descent algorithms applied to Lojasiewicz functions, demonstrating faster convergence of function values than iterates, applicable to both smooth and nonsmooth cases.
Contribution
It provides the first convergence rate analysis of SZGD algorithms for Lojasiewicz functions, including non-smooth cases, expanding understanding of zeroth-order optimization.
Findings
Function values converge faster than iterates.
Convergence rates hold for both smooth and nonsmooth functions.
Results apply to a broad class of Lojasiewicz functions.
Abstract
We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - \eta_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where is the objective function that satisfies the \L ojasiewicz inequality with \L ojasiewicz exponent , is the step size (learning rate), and is the approximate gradient estimated using zeroth-order information only. Our results show that can converge faster than , regardless of whether the objective is smooth or nonsmooth.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Stochastic processes and financial applications
