Complexities of Armijo-like algorithms in Deep Learning context
Bensaid Bilel (IMB)

TL;DR
This paper investigates Armijo-like algorithms in Deep Learning, demonstrating their potential for acceleration and optimal complexity under conditions relevant to non-convex, highly non-smooth optimization problems.
Contribution
The work extends Armijo algorithm analysis to Deep Learning contexts, establishing new complexity bounds under (L0, L1) smoothness and analyticity assumptions.
Findings
Armijo variants achieve acceleration in Deep Learning optimization.
New complexity bounds depend on smoothness constants and initial gap.
Armijo-like conditions are effective for highly non-convex problems.
Abstract
The classical Armijo backtracking algorithm achieves the optimal complexity for smooth functions like gradient descent but without any hyperparameter tuning. However, the smoothness assumption is not suitable for Deep Learning optimization. In this work, we show that some variants of the Armijo optimizer achieves acceleration and optimal complexities under assumptions more suited for Deep Learning: the (L 0 , L 1 ) smoothness condition and analyticity. New dependences on the smoothness constants and the initial gap are established. The results theoretically highlight the powerful efficiency of Armijo-like conditions for highly non-convex problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
