Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training
Yue Hu, Zanxia Cao, Yingchao Liu

TL;DR
This paper introduces Dimer-Enhanced Optimization (DEO), a physics-inspired first-order method that efficiently escapes saddle points in neural network training by approximating curvature without full Hessian computation.
Contribution
The paper proposes DEO, a novel first-order optimization framework that adapts the Dimer method to improve navigation of complex loss landscapes in neural networks.
Findings
DEO effectively escapes saddle points in neural network training.
Preliminary results show DEO improves training efficiency on a Transformer toy model.
DEO offers a computationally feasible way to incorporate curvature information in large-scale models.
Abstract
First-order optimization methods, such as SGD and Adam, are widely used for training large-scale deep neural networks due to their computational efficiency and robust performance. However, relying solely on gradient information, these methods often struggle to navigate complex loss landscapes with flat regions, plateaus, and saddle points. Second-order methods, which use curvature information from the Hessian matrix, can address these challenges but are computationally infeasible for large models. The Dimer method, a first-order technique that constructs two closely spaced points to probe the local geometry of a potential energy surface, efficiently estimates curvature using only gradient information. Inspired by its use in molecular dynamics simulations for locating saddle points, we propose Dimer-Enhanced Optimization (DEO), a novel framework to escape saddle points in neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
