Convergence Rate Analysis of LION
Yiming Dong, Huan Li, Zhouchen Lin

TL;DR
This paper provides a rigorous convergence rate analysis of the LION optimizer, showing it converges at an optimal rate to critical points in nonconvex stochastic optimization, supported by extensive experiments.
Contribution
It establishes the first comprehensive convergence rate for LION, matching the theoretical lower bounds and confirming empirical performance advantages over SGD.
Findings
LION converges to KKT points at rate O(√d K^{-1/4})
LION outperforms SGD in loss and performance
Empirical gradient norm ratios match theoretical predictions
Abstract
The LION (evoLved sIgn mOmeNtum) optimizer for deep neural network training was found by Google via program search, with the simple sign update yet showing impressive performance in training large scale networks. Although previous studies have investigated its convergence properties, a comprehensive analysis, especially the convergence rate, is still desirable. Recognizing that LION can be regarded as solving a specific constrained problem, this paper focuses on demonstrating its convergence to the Karush-Kuhn-Tucker (KKT) point at the rate of measured by gradient norm, where is the problem dimension and is the number of iteration steps. Step further, we remove the constraint and establish that LION converges to the critical point of the general unconstrained problem at the same rate. This rate not only delivers the currently optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEngineering Applied Research
MethodsStochastic Gradient Descent · Evolved Sign Momentum
