How Does Adaptive Optimization Impact Local Neural Network Geometry?
Kaiqi Jiang, Dhruv Malik, Yuanzhi Li

TL;DR
This paper investigates how adaptive optimization methods influence the local geometry of neural network training trajectories, revealing they bias the process towards regions with more favorable curvature, which may explain their superior convergence.
Contribution
The paper introduces a new local trajectory statistic, $R^{ ext{OPT}}_{ ext{med}}$, and demonstrates empirically and theoretically that adaptive methods steer training towards regions with smaller values of this statistic.
Findings
Adaptive methods bias trajectories towards regions with smaller $R^{ ext{OPT}}_{ ext{med}}$.
Vanilla gradient methods tend to bias towards regions with larger $R^{ ext{OPT}}_{ ext{med}}$.
Empirical and theoretical evidence supports the new local geometry perspective.
Abstract
Adaptive optimization methods are well known to achieve superior convergence relative to vanilla gradient methods. The traditional viewpoint in optimization, particularly in convex optimization, explains this improved performance by arguing that, unlike vanilla gradient schemes, adaptive algorithms mimic the behavior of a second-order method by adapting to the global geometry of the loss function. We argue that in the context of neural network optimization, this traditional viewpoint is insufficient. Instead, we advocate for a local trajectory analysis. For iterate trajectories produced by running a generic optimization algorithm OPT, we introduce , a statistic that is analogous to the condition number of the loss Hessian evaluated at the iterates. Through extensive experiments, we show that adaptive methods such as Adam bias the trajectories towards regions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Image Processing Techniques
