Analytic expressions for the output evolution of a deep neural network
Anastasia Borovykh

TL;DR
This paper introduces a Taylor expansion-based methodology to derive analytical expressions for the expected output and weights of deep neural networks during stochastic training, revealing how noise influences generalization.
Contribution
It provides a novel analytical framework to study the effects of hyperparameters and noise on deep neural network training and generalization.
Findings
Early training output behaves like a linear model with noise preventing overfitting.
Higher order approximations show noise can regularize the output in non-linear models.
Noise impacts the weight Hessian, correlating with improved generalization.
Abstract
We present a novel methodology based on a Taylor expansion of the network output for obtaining analytical expressions for the expected value of the network weights and output under stochastic training. Using these analytical expressions the effects of the hyperparameters and the noise variance of the optimization algorithm on the performance of the deep neural network are studied. In the early phases of training with a small noise coefficient, the output is equivalent to a linear model. In this case the network can generalize better due to the noise preventing the output from fully converging on the train data, however the noise does not result in any explicit regularization. In the later training stages, when higher order approximations are required, the impact of the noise becomes more significant, i.e. in a model which is non-linear in the weights noise can regularize the output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks
