Analytic theory of dropout regularization

Francesco Mori; Francesca Mignacco

arXiv:2505.07792·stat.ML·September 10, 2025

Analytic theory of dropout regularization

Francesco Mori, Francesca Mignacco

PDF

Open Access

TL;DR

This paper provides an analytical framework for understanding dropout regularization in neural networks, revealing how it improves generalization by reducing node correlations and noise impact, with optimal dropout levels depending on data noise.

Contribution

It introduces a set of differential equations modeling dropout effects in two-layer neural networks trained with stochastic gradient descent, offering exact insights into its benefits and optimal settings.

Findings

01

Dropout reduces correlations between hidden nodes.

02

Optimal dropout probability increases with data noise.

03

Dropout mitigates the impact of label noise.

Abstract

Dropout is a regularization technique widely used in training artificial neural networks to mitigate overfitting. It consists of dynamically deactivating subsets of the network during training to promote more robust representations. Despite its widespread adoption, dropout probabilities are often selected heuristically, and theoretical explanations of its success remain sparse. Here, we analytically study dropout in two-layer neural networks trained with online stochastic gradient descent. In the high-dimensional limit, we derive a set of ordinary differential equations that fully characterize the evolution of the network during training and capture the effects of dropout. We obtain a number of exact results describing the generalization error and the optimal dropout probability at short, intermediate, and long training times. Our analysis shows that dropout reduces detrimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Machine Learning and ELM

MethodsDropout · Sparse Evolutionary Training