On the Analysis of Trajectories of Gradient Descent in the Optimization of Deep Neural Networks
Adepu Ravi Sankar, Vishwak Srinivasan, Vineeth N Balasubramanian

TL;DR
This paper investigates how adding noise during training influences the trajectories of gradient descent in deep neural networks, showing that noise can help reach full-rank solutions and potentially global optima.
Contribution
It provides a theoretical framework linking noise injection methods to increased rank of weight matrix products, aiding in understanding optimization in deep networks.
Findings
Adding noise increases the rank of weight matrix products.
Noise can help neural networks reach global optima under certain conditions.
Empirical results support the theoretical analysis.
Abstract
Theoretical analysis of the error landscape of deep neural networks has garnered significant interest in recent years. In this work, we theoretically study the importance of noise in the trajectories of gradient descent towards optimal solutions in multi-layer neural networks. We show that adding noise (in different ways) to a neural network while training increases the rank of the product of weight matrices of a multi-layer linear neural network. We thus study how adding noise can assist reaching a global optimum when the product matrix is full-rank (under certain conditions). We establish theoretical foundations between the noise induced into the neural network - either to the gradient, to the architecture, or to the input/output to a neural network - and the rank of product of weight matrices. We corroborate our theoretical findings with empirical results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Tensor decomposition and applications
