Competing nonlinearities, criticality, and order-to-chaos transition in deep networks
Omri Lesser, Debanjan Chowdhury

TL;DR
This paper introduces a phase transition in deep networks using mixed nonlinear activations, enabling scale-invariant signal propagation and impacting training and generalization.
Contribution
It demonstrates a new mechanism for a continuous phase transition in neural networks via statistical mixtures of activations, with practical implications for training and architecture design.
Findings
A sharp transition in variance scaling at a critical mixture proportion p_c.
Networks at p_c exhibit depth-independent variance and scale invariance.
Training results show optimal performance near the theoretical critical point p_c.
Abstract
Deep neural networks owe their expressive power to nonlinear activation functions. The effective field theory of signal propagation at initialization reveals a few distinct universality classes of activations that exhibit different depth scaling. Tuning across these, especially with analytical control, is an open problem. We show that a statistical mixture of activations, where each neuron independently and randomly draws its activation from a two-component distribution with mixing fraction , provides a new mechanism for a continuous phase transition. Applied to a mixture of Tanh and Swish, the transition is sharp in the depth scaling of the preactivation variance, separating a variance-collapsing from a variance-inflating phase; at , the network acquires statistical scale invariance, with depth-independent variance, without sacrificing smoothness. This resolves a longstanding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
