Analytic theory of dropout regularization
Francesco Mori, Francesca Mignacco

TL;DR
This paper provides an analytical framework for understanding dropout regularization in neural networks, revealing how it improves generalization by reducing node correlations and noise impact, with optimal dropout levels depending on data noise.
Contribution
It introduces a set of differential equations modeling dropout effects in two-layer neural networks trained with stochastic gradient descent, offering exact insights into its benefits and optimal settings.
Findings
Dropout reduces correlations between hidden nodes.
Optimal dropout probability increases with data noise.
Dropout mitigates the impact of label noise.
Abstract
Dropout is a regularization technique widely used in training artificial neural networks to mitigate overfitting. It consists of dynamically deactivating subsets of the network during training to promote more robust representations. Despite its widespread adoption, dropout probabilities are often selected heuristically, and theoretical explanations of its success remain sparse. Here, we analytically study dropout in two-layer neural networks trained with online stochastic gradient descent. In the high-dimensional limit, we derive a set of ordinary differential equations that fully characterize the evolution of the network during training and capture the effects of dropout. We obtain a number of exact results describing the generalization error and the optimal dropout probability at short, intermediate, and long training times. Our analysis shows that dropout reduces detrimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Machine Learning and ELM
MethodsDropout · Sparse Evolutionary Training
