Simba: A Scalable Bilevel Preconditioned Gradient Method for Fast Evasion of Flat Areas and Saddle Points
Nick Tsipinakis, Panos Parpas

TL;DR
Simba is a scalable preconditioned gradient method designed to efficiently escape saddle points and flat regions in high-dimensional non-convex optimization, improving convergence and generalization in machine learning tasks.
Contribution
It introduces a simple, scalable preconditioning approach using a moving average of gradients, linked with multilevel optimization techniques for efficient saddle point escape.
Findings
Verifies scalability and effectiveness near saddle points and flat areas.
Demonstrates satisfactory generalization on benchmark residual networks.
Shows linear convergence rate for strongly convex functions.
Abstract
The convergence behaviour of first-order methods can be severely slowed down when applied to high-dimensional non-convex functions due to the presence of saddle points. If, additionally, the saddles are surrounded by large plateaus, it is highly likely that the first-order methods will converge to sub-optimal solutions. In machine learning applications, sub-optimal solutions mean poor generalization performance. They are also related to the issue of hyper-parameter tuning, since, in the pursuit of solutions that yield lower errors, a tremendous amount of time is required on selecting the hyper-parameters appropriately. A natural way to tackle the limitations of first-order methods is to employ the Hessian information. However, methods that incorporate the Hessian do not scale or, if they do, they are very slow for modern applications. Here, we propose Simba, a scalable preconditioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Graphene research and applications
