Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan Li, Qilong Gu, Yingxue Zhou, Tiancong Chen, and Arindam, Banerjee

TL;DR
This paper investigates the dynamics and generalization of SGD in deep neural networks using Hessian-based analysis, providing new empirical insights and theoretical results on optimization behavior and scale-invariant bounds.
Contribution
It introduces a Hessian-based framework to analyze SGD dynamics and generalization, connecting the Hessian to stochastic gradients and proposing scale-invariant generalization bounds.
Findings
Hessian relates to the second moment of stochastic gradients
SGD dynamics can be characterized by Hessian and gradient moments
Scale-invariant generalization bounds are derived from Hessian analysis
Abstract
While stochastic gradient descent (SGD) and variants have been surprisingly successful for training deep nets, several aspects of the optimization dynamics and generalization are still not well understood. In this paper, we present new empirical observations and theoretical results on both the optimization dynamics and generalization behavior of SGD for deep nets based on the Hessian of the training loss and associated quantities. We consider three specific research questions: (1) what is the relationship between the Hessian of the loss and the second moment of stochastic gradients (SGs)? (2) how can we characterize the stochastic optimization dynamics of SGD with fixed and adaptive step sizes and diagonal pre-conditioning based on the first and second moments of SGs? and (3) how can we characterize a scale-invariant generalization bound of deep nets based on the Hessian of the loss,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
