Neural network optimization strategies and the topography of the loss landscape

Jianneng Yu; Alexandre V. Morozov

arXiv:2602.21276·cs.LG·February 26, 2026

Neural network optimization strategies and the topography of the loss landscape

Jianneng Yu, Alexandre V. Morozov

PDF

Open Access

TL;DR

This paper compares stochastic gradient descent and quasi-Newton methods in neural network training, revealing how different optimization strategies influence the landscape of solutions and their generalization capabilities.

Contribution

It introduces a novel algorithm, FourierPathFinder, to analyze loss landscape paths and demonstrates how optimizer choice affects solution depth and transferability.

Findings

01

SGD solutions have lower barriers and are more generalizable.

02

Quasi-Newton solutions find deeper, isolated minima.

03

Optimization method impacts the topography and robustness of neural network solutions.

Abstract

Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in machine learning is the performance of trained neural networks on previously unseen test data. Here, we investigate neural network training by stochastic gradient descent (SGD) - a non-convex global optimization algorithm which relies only on the gradient of the objective function. We contrast SGD solutions with those obtained via a non-stochastic quasi-Newton method, which utilizes curvature information to determine step direction and Golden Section Search to choose step size. We use several computational tools to investigate neural network parameters obtained by these two optimization methods, including kernel Principal Component Analysis and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks