Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods
Zijian Liu, Zhengyuan Zhou

TL;DR
This paper provides a unified analysis of the last-iterate convergence of stochastic gradient methods, extending results to broader settings including non-Euclidean norms, composite objectives, and heavy-tailed noise.
Contribution
It introduces a comprehensive framework for proving last-iterate convergence rates of SGD under general conditions, overcoming previous limitations such as bounded noise and compact domains.
Findings
Unified convergence analysis for general domains and objectives
Extension to non-Euclidean norms and composite optimization
Convergence guarantees under heavy-tailed and sub-Weibull noise
Abstract
In the past several years, the last-iterate convergence of the Stochastic Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding. For Lipschitz convex functions, different works have established the optimal or high-probability convergence rates for the final iterate, where T is the time horizon and \delta is the failure probability. However, to prove these bounds, all the existing works are either limited to compact domains or require almost surely bounded noise. It is natural to ask whether the last iterate of SGD can still guarantee the optimal convergence rate but without these two restrictive assumptions. Besides this important question, there are still lots of theoretical problems lacking an answer. For example, compared with the…
Peer Reviews
Decision·ICLR 2024 poster
As implicitly stated in my "Summary", I do think that the goal of the paper is interesting. A result involving high probability guarantees for the last iterate of SGD for convex problems is interesting in my opinion. The proofs are comprehensive and mostly carefully written (even though there are readability issues I will expand on in the later parts of my review). Having a unified result is also nice to cover different important settings.
Even though I think the main result of high probability rates for last iterate of SGD without bounded domains is interesting and worthy of acceptance (once the correctness is verified) there are many issues with the writing of the paper and proofs that should also be addressed which prevented me to be able to verify the correctness. Right now, the repeating theme in the paper is for the authors to spend way too much time and effort to show their improvements in marginal cases, which confuses rea
The work presents high-probability and in expectation convergence result for the last iterate of SGD in general domains for convex or strongly convex objectives
-In find the work incremental compared to the litterature. In fact, the work generelises the convergence results of the last iterate SGD for convex or strongly convex objectives to the general domains not necessarly compact. I find the content of the paper more adapted to be publisehd in a math/optimisation journal than ICLR.
This work makes a solid contribution to the understanding of the last-iterate convergence of stochastic gradient methods, which is an important problem is convex optimization and particularly gains interest from the ML community, since in practice the theoretically sub-optimal choice of the last iterate is cheaper and thus more popular. The technical results are general and cover a wide range of settings, bypassing a few constraints of previous works including assumptions on compact domain and
I do not find any substantial weakness. One constraint of this work is that it lacks a proof sketch or discussion on the main idea in the main-text. I believe the paper can benefit from adding a simplest example of $z_t$, explaining how it's used to utilize convexity. Doing so can improve the readability by giving the readers more intuitions on the design of $z_t$ and how it works.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
