Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
Lucas Fernandez Sarmiento

TL;DR
This paper develops a mean-field theory of dropout at the edge of chaos, revealing universal scaling laws and optimal schedules that improve neural network performance without extra cost.
Contribution
It introduces a new theoretical framework for understanding dropout as a perturbation at the edge of chaos, identifying universality classes and optimal scheduling strategies.
Findings
Dropout shifts the fixed point, making depth scaling finite at critical initialization.
Smooth and kinked activations belong to different universality classes with distinct critical exponents.
Optimal dropout schedules significantly reduce test loss and improve accuracy in neural networks.
Abstract
We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos. Dropout shifts the perfect-alignment fixed point, making the depth scale for information propagation finite even at critical initialization. We derive critical and crossover scaling laws for correlation decay and establish that smooth activations and kinked, ReLU-like activations constitute distinct universality classes, with different critical exponents and a universal two-parameter scaling collapse in detuning and dropout strength. The distinction traces to the analytic structure of the correlation map: smooth activations admit a Taylor expansion near perfect alignment, while kinked activations develop a branch point with universal non-analyticity. As a corollary, the framework yields saturated dropout profiles under fixed budget; a rank-flow tie-breaker then selects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
