Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

Yue Hu; Zanxia Cao; Yingchao Liu

arXiv:2507.19968·cs.LG·July 29, 2025

Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training

Yue Hu, Zanxia Cao, Yingchao Liu

PDF

TL;DR

This paper introduces Dimer-Enhanced Optimization (DEO), a physics-inspired first-order method that efficiently escapes saddle points in neural network training by approximating curvature without full Hessian computation.

Contribution

The paper proposes DEO, a novel first-order optimization framework that adapts the Dimer method to improve navigation of complex loss landscapes in neural networks.

Findings

01

DEO effectively escapes saddle points in neural network training.

02

Preliminary results show DEO improves training efficiency on a Transformer toy model.

03

DEO offers a computationally feasible way to incorporate curvature information in large-scale models.

Abstract

First-order optimization methods, such as SGD and Adam, are widely used for training large-scale deep neural networks due to their computational efficiency and robust performance. However, relying solely on gradient information, these methods often struggle to navigate complex loss landscapes with flat regions, plateaus, and saddle points. Second-order methods, which use curvature information from the Hessian matrix, can address these challenges but are computationally infeasible for large models. The Dimer method, a first-order technique that constructs two closely spaced points to probe the local geometry of a potential energy surface, efficiently estimates curvature using only gradient information. Inspired by its use in molecular dynamics simulations for locating saddle points, we propose Dimer-Enhanced Optimization (DEO), a novel framework to escape saddle points in neural network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.