Convergence Rate Analysis of LION

Yiming Dong; Huan Li; Zhouchen Lin

arXiv:2411.07724·cs.LG·November 13, 2024

Convergence Rate Analysis of LION

Yiming Dong, Huan Li, Zhouchen Lin

PDF

Open Access

TL;DR

This paper provides a rigorous convergence rate analysis of the LION optimizer, showing it converges at an optimal rate to critical points in nonconvex stochastic optimization, supported by extensive experiments.

Contribution

It establishes the first comprehensive convergence rate for LION, matching the theoretical lower bounds and confirming empirical performance advantages over SGD.

Findings

01

LION converges to KKT points at rate O(√d K^{-1/4})

02

LION outperforms SGD in loss and performance

03

Empirical gradient norm ratios match theoretical predictions

Abstract

The LION (evoLved sIgn mOmeNtum) optimizer for deep neural network training was found by Google via program search, with the simple sign update yet showing impressive performance in training large scale networks. Although previous studies have investigated its convergence properties, a comprehensive analysis, especially the convergence rate, is still desirable. Recognizing that LION can be regarded as solving a specific constrained problem, this paper focuses on demonstrating its convergence to the Karush-Kuhn-Tucker (KKT) point at the rate of $O (d K^{- 1/4})$ measured by gradient $ℓ_{1}$ norm, where $d$ is the problem dimension and $K$ is the number of iteration steps. Step further, we remove the constraint and establish that LION converges to the critical point of the general unconstrained problem at the same rate. This rate not only delivers the currently optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEngineering Applied Research

MethodsStochastic Gradient Descent · Evolved Sign Momentum