Separation of time scales and direct computation of weights in deep neural networks
Nima Dehmamy, Neda Rohani, Aggelos Katsaggelos

TL;DR
This paper demonstrates that by exploiting time-scale separation in deep neural network training, one can directly compute layer weights using class-based PCA, reducing data needs and training time while maintaining performance.
Contribution
The authors introduce a novel approach leveraging time-scale separation and class-based PCA to directly derive DNN weights, bypassing traditional SGD training.
Findings
Direct PCA-based layer derivation matches or exceeds SGD performance.
Significant reduction in training data needed for effective DNN training.
Training time decreases by eliminating backpropagation and reducing data requirements.
Abstract
Artificial intelligence is revolutionizing our lives at an ever increasing pace. At the heart of this revolution is the recent advancements in deep neural networks (DNN), learning to perform sophisticated, high-level tasks. However, training DNNs requires massive amounts of data and is very computationally intensive. Gaining analytical understanding of the solutions found by DNNs can help us devise more efficient training algorithms, replacing the commonly used mthod of stochastic gradient descent (SGD). We analyze the dynamics of SGD and show that, indeed, direct computation of the solutions is possible in many cases. We show that a high performing setup used in DNNs introduces a separation of time-scales in the training dynamics, allowing SGD to train layers from the lowest (closest to input) to the highest. We then show that for each layer, the distribution of solutions found by SGD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Gaussian Processes and Bayesian Inference
MethodsPrincipal Components Analysis · Stochastic Gradient Descent
