Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)
Umangi Jain, Harish G. Ramaswamy

TL;DR
This paper introduces a method to predict the success of training deep neural networks with specific dataset-architecture-initialization combinations by analyzing the evolution of singular values in hidden layers, enabling early stopping.
Contribution
It proposes a novel approach using singular value evolution to predict training success without validation labels, improving early decision-making in neural network training.
Findings
Singular value dynamics correlate with training success.
The method outperforms early validation accuracy in prediction.
Applicable across multiple datasets and architectures.
Abstract
Despite their massive success, training successful deep neural networks still largely relies on experimentally choosing an architecture, hyper-parameters, initialization, and training mechanism. In this work, we focus on determining the success of standard gradient descent method for training deep neural networks on a specified dataset, architecture, and initialization (DAI) combination. Through extensive systematic experiments, we show that the evolution of singular values of the matrix obtained from the hidden layers of a DNN can aid in determining the success of gradient descent technique to train a DAI, even in the absence of validation labels in the supervised learning paradigm. This phenomenon can facilitate early give-up, stopping the training of neural networks which are predicted to not generalize well, early in the training process. Our experimentation across multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
