Accelerating Neural Network Training Along Sharp and Flat Directions
Daniyar Zakarin, Sidak Pal Singh

TL;DR
This paper investigates the role of sharp and flat directions in neural network training, introducing Bulk-SGD and interpolated gradient methods to accelerate convergence while analyzing stability and curvature properties.
Contribution
It introduces Bulk-SGD, a novel optimizer restricting updates to the orthogonal complement of the dominant subspace, and proposes interpolated gradient methods to balance convergence and stability.
Findings
Updates along flatter directions accelerate convergence.
Curvature energy concentrates in the dominant subspace.
Interpolated methods unify SGD, Dom-SGD, and Bulk-SGD.
Abstract
Recent work has highlighted a surprising alignment between gradients and the top eigenspace of the Hessian -- termed the Dominant subspace -- during neural network training. Concurrently, there has been growing interest in the distinct roles of sharp and flat directions in the Hessian spectrum. In this work, we study Bulk-SGD, a variant of SGD that restricts updates to the orthogonal complement of the Dominant subspace. Through ablation studies, we characterize the stability properties of Bulk-SGD and identify critical hyperparameters that govern its behavior. We show that updates along the Bulk subspace, corresponding to flatter directions in the loss landscape, can accelerate convergence but may compromise stability. To balance these effects, we introduce interpolated gradient methods that unify SGD, Dom-SGD, and Bulk-SGD. Finally, we empirically connect this subspace decomposition to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
MethodsStochastic Gradient Descent
