Deep learning, stochastic gradient descent and diffusion maps
Carmina Fjellstr\"om, Kaj Nystr\"om

TL;DR
This paper investigates why stochastic gradient descent (SGD) is effective in deep learning by analyzing the high-dimensional loss landscape through diffusion maps, revealing potential low-dimensional structures that guide optimization.
Contribution
It introduces a data-driven approach using diffusion maps to explore the high-dimensional loss landscape and uncover low-dimensional representations of SGD dynamics.
Findings
Eigenvalues of the Hessian are mostly near zero, indicating low diffusion in many directions.
SGD dynamics may primarily occur on a low-dimensional manifold.
Diffusion maps can reveal low-dimensional structures in the loss landscape.
Abstract
Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency, but a complete understanding of why SGD performs so well remains a major challenge. It has been observed empirically that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep neural networks are close to zero, while only a small number of eigenvalues are large. Zero eigenvalues indicate zero diffusion along the corresponding directions. This indicates that the process of minima selection mainly happens in the relatively low-dimensional subspace corresponding to the top eigenvalues of the Hessian. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold. In this paper, we pursue a truly data driven approach to the problem of getting a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks
MethodsDiffusion · Stochastic Gradient Descent
