Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models
Chaoyue Liu, Libin Zhu, Mikhail Belkin

TL;DR
This paper explains the transition to linearity in wide neural networks as an emergent property resulting from assembling many diverse weak sub-models, providing a new perspective on neural network behavior.
Contribution
It introduces a novel assembly model perspective to explain the emergence of linearity in wide neural networks, highlighting the role of diverse weak sub-models.
Findings
Linearity emerges as a property of assembling many weak sub-models.
Wide neural networks can be viewed as an assembly of diverse neurons.
The assembly process explains the near-constant NTK in wide networks.
Abstract
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent. These findings seem counter-intuitive since in general neural networks are highly complex models. Why does a linear structure emerge when the networks become wide? In this work, we provide a new perspective on this "transition to linearity" by considering a neural network as an assembly model recursively built from a set of sub-models corresponding to individual neurons. In this view, we show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Memory and Neural Computing · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
