Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
Cheolhyoung Lee, Kyunghyun Cho, Wanmo Kang

TL;DR
This paper introduces a geometric analysis of SGD in deep learning, modeling the directions of minibatch gradients with von Mises-Fisher distributions, revealing that their directional uniformity increases during training and significantly influences SGD dynamics.
Contribution
It proposes a novel von Mises-Fisher distribution model for analyzing the directional concentration of minibatch gradients in SGD, providing new insights into its dynamics.
Findings
Directional uniformity of minibatch gradients increases during training.
Higher correlation between gradient stochasticity and directional uniformity than with gradient norm.
Directional statistics are a major factor behind SGD behavior.
Abstract
Although stochastic gradient descent (SGD) is a driving force behind the recent success of deep learning, our understanding of its dynamics in a high-dimensional parameter space is limited. In recent years, some researchers have used the stochasticity of minibatch gradients, or the signal-to-noise ratio, to better characterize the learning dynamics of SGD. Inspired from these work, we here analyze SGD from a geometrical perspective by inspecting the stochasticity of the norms and directions of minibatch gradients. We propose a model of the directional concentration for minibatch gradients through von Mises-Fisher (VMF) distribution, and show that the directional uniformity of minibatch gradients increases over the course of SGD. We empirically verify our result using deep convolutional networks and observe a higher correlation between the gradient stochasticity and the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Neural Networks and Applications
MethodsStochastic Gradient Descent
