Neural network optimization strategies and the topography of the loss landscape
Jianneng Yu, Alexandre V. Morozov

TL;DR
This paper compares stochastic gradient descent and quasi-Newton methods in neural network training, revealing how different optimization strategies influence the landscape of solutions and their generalization capabilities.
Contribution
It introduces a novel algorithm, FourierPathFinder, to analyze loss landscape paths and demonstrates how optimizer choice affects solution depth and transferability.
Findings
SGD solutions have lower barriers and are more generalizable.
Quasi-Newton solutions find deeper, isolated minima.
Optimization method impacts the topography and robustness of neural network solutions.
Abstract
Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in machine learning is the performance of trained neural networks on previously unseen test data. Here, we investigate neural network training by stochastic gradient descent (SGD) - a non-convex global optimization algorithm which relies only on the gradient of the objective function. We contrast SGD solutions with those obtained via a non-stochastic quasi-Newton method, which utilizes curvature information to determine step direction and Golden Section Search to choose step size. We use several computational tools to investigate neural network parameters obtained by these two optimization methods, including kernel Principal Component Analysis and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks
