Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective
Joel Wendin, Claudio Altafini

TL;DR
This survey reviews recent advances in understanding the dynamics and loss landscape of gradient flow equations in deep linear neural networks, highlighting their mathematical properties and critical point structure.
Contribution
It provides a comprehensive analysis of the gradient flow equations formulated as matrix ODEs, revealing their nilpotent, polynomial, and isospectral nature, and describes the loss landscape's critical points and invariants.
Findings
Gradient flow equations form nilpotent, polynomial, isospectral matrix ODEs.
Loss landscape has infinitely many global minima and saddle points, no local minima.
Critical values correspond to singular values of data learned by the network.
Abstract
The paper surveys recent progresses in understanding the dynamics and loss landscape of the gradient flow equations associated to deep linear neural networks, i.e., the gradient descent training dynamics (in the limit when the step size goes to 0) of deep neural networks missing the activation functions and subject to quadratic loss functions. When formulated in terms of the adjacency matrix of the neural network, as we do in the paper, these gradient flow equations form a class of converging matrix ODEs which is nilpotent, polynomial, isospectral, and with conservation laws. The loss landscape is described in detail. It is characterized by infinitely many global minima and saddle points, both strict and nonstrict, but lacks local minima and maxima. The loss function itself is a positive semidefinite Lyapunov function for the gradient flow, and its level sets are unbounded invariant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Reservoir Computing
