Accelerating Neural Network Training Along Sharp and Flat Directions

Daniyar Zakarin; Sidak Pal Singh

arXiv:2505.11972·cs.LG·May 20, 2025

Accelerating Neural Network Training Along Sharp and Flat Directions

Daniyar Zakarin, Sidak Pal Singh

PDF

Open Access

TL;DR

This paper investigates the role of sharp and flat directions in neural network training, introducing Bulk-SGD and interpolated gradient methods to accelerate convergence while analyzing stability and curvature properties.

Contribution

It introduces Bulk-SGD, a novel optimizer restricting updates to the orthogonal complement of the dominant subspace, and proposes interpolated gradient methods to balance convergence and stability.

Findings

01

Updates along flatter directions accelerate convergence.

02

Curvature energy concentrates in the dominant subspace.

03

Interpolated methods unify SGD, Dom-SGD, and Bulk-SGD.

Abstract

Recent work has highlighted a surprising alignment between gradients and the top eigenspace of the Hessian -- termed the Dominant subspace -- during neural network training. Concurrently, there has been growing interest in the distinct roles of sharp and flat directions in the Hessian spectrum. In this work, we study Bulk-SGD, a variant of SGD that restricts updates to the orthogonal complement of the Dominant subspace. Through ablation studies, we characterize the stability properties of Bulk-SGD and identify critical hyperparameters that govern its behavior. We show that updates along the Bulk subspace, corresponding to flatter directions in the loss landscape, can accelerate convergence but may compromise stability. To balance these effects, we introduce interpolated gradient methods that unify SGD, Dom-SGD, and Bulk-SGD. Finally, we empirically connect this subspace decomposition to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference

MethodsStochastic Gradient Descent