Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

Xinyan Li; Qilong Gu; Yingxue Zhou; Tiancong Chen; and Arindam; Banerjee

arXiv:1907.10732·cs.LG·July 26, 2019·6 cites

Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization

Xinyan Li, Qilong Gu, Yingxue Zhou, Tiancong Chen, and Arindam, Banerjee

PDF

Open Access

TL;DR

This paper investigates the dynamics and generalization of SGD in deep neural networks using Hessian-based analysis, providing new empirical insights and theoretical results on optimization behavior and scale-invariant bounds.

Contribution

It introduces a Hessian-based framework to analyze SGD dynamics and generalization, connecting the Hessian to stochastic gradients and proposing scale-invariant generalization bounds.

Findings

01

Hessian relates to the second moment of stochastic gradients

02

SGD dynamics can be characterized by Hessian and gradient moments

03

Scale-invariant generalization bounds are derived from Hessian analysis

Abstract

While stochastic gradient descent (SGD) and variants have been surprisingly successful for training deep nets, several aspects of the optimization dynamics and generalization are still not well understood. In this paper, we present new empirical observations and theoretical results on both the optimization dynamics and generalization behavior of SGD for deep nets based on the Hessian of the training loss and associated quantities. We consider three specific research questions: (1) what is the relationship between the Hessian of the loss and the second moment of stochastic gradients (SGs)? (2) how can we characterize the stochastic optimization dynamics of SGD with fixed and adaptive step sizes and diagonal pre-conditioning based on the first and second moments of SGs? and (3) how can we characterize a scale-invariant generalization bound of deep nets based on the Hessian of the loss,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsStochastic Gradient Descent