Separation of time scales and direct computation of weights in deep   neural networks

Nima Dehmamy; Neda Rohani; Aggelos Katsaggelos

arXiv:1703.04757·cs.LG·March 13, 2018·1 cites

Separation of time scales and direct computation of weights in deep neural networks

Nima Dehmamy, Neda Rohani, Aggelos Katsaggelos

PDF

Open Access

TL;DR

This paper demonstrates that by exploiting time-scale separation in deep neural network training, one can directly compute layer weights using class-based PCA, reducing data needs and training time while maintaining performance.

Contribution

The authors introduce a novel approach leveraging time-scale separation and class-based PCA to directly derive DNN weights, bypassing traditional SGD training.

Findings

01

Direct PCA-based layer derivation matches or exceeds SGD performance.

02

Significant reduction in training data needed for effective DNN training.

03

Training time decreases by eliminating backpropagation and reducing data requirements.

Abstract

Artificial intelligence is revolutionizing our lives at an ever increasing pace. At the heart of this revolution is the recent advancements in deep neural networks (DNN), learning to perform sophisticated, high-level tasks. However, training DNNs requires massive amounts of data and is very computationally intensive. Gaining analytical understanding of the solutions found by DNNs can help us devise more efficient training algorithms, replacing the commonly used mthod of stochastic gradient descent (SGD). We analyze the dynamics of SGD and show that, indeed, direct computation of the solutions is possible in many cases. We show that a high performing setup used in DNNs introduces a separation of time-scales in the training dynamics, allowing SGD to train layers from the lowest (closest to input) to the highest. We then show that for each layer, the distribution of solutions found by SGD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Gaussian Processes and Bayesian Inference

MethodsPrincipal Components Analysis · Stochastic Gradient Descent