Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos

Lucas Fernandez Sarmiento

arXiv:2605.21648·cs.LG·May 22, 2026

Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos

Lucas Fernandez Sarmiento

PDF

TL;DR

This paper develops a mean-field theory of dropout at the edge of chaos, revealing universal scaling laws and optimal schedules that improve neural network performance without extra cost.

Contribution

It introduces a new theoretical framework for understanding dropout as a perturbation at the edge of chaos, identifying universality classes and optimal scheduling strategies.

Findings

01

Dropout shifts the fixed point, making depth scaling finite at critical initialization.

02

Smooth and kinked activations belong to different universality classes with distinct critical exponents.

03

Optimal dropout schedules significantly reduce test loss and improve accuracy in neural networks.

Abstract

We develop a mean-field theory of dropout as a perturbation of critical signal propagation at the edge of chaos. Dropout shifts the perfect-alignment fixed point, making the depth scale for information propagation finite even at critical initialization. We derive critical and crossover scaling laws for correlation decay and establish that smooth activations and kinked, ReLU-like activations constitute distinct universality classes, with different critical exponents and a universal two-parameter scaling collapse in detuning and dropout strength. The distinction traces to the analytic structure of the correlation map: smooth activations admit a Taylor expansion near perfect alignment, while kinked activations develop a branch point with universal non-analyticity. As a corollary, the framework yields saturated dropout profiles under fixed budget; a rank-flow tie-breaker then selects…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.