Transition to Linearity of Wide Neural Networks is an Emerging Property   of Assembling Weak Models

Chaoyue Liu; Libin Zhu; Mikhail Belkin

arXiv:2203.05104·cs.LG·March 11, 2022

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

Chaoyue Liu, Libin Zhu, Mikhail Belkin

PDF

Open Access 1 Video

TL;DR

This paper explains the transition to linearity in wide neural networks as an emergent property resulting from assembling many diverse weak sub-models, providing a new perspective on neural network behavior.

Contribution

It introduces a novel assembly model perspective to explain the emergence of linearity in wide neural networks, highlighting the role of diverse weak sub-models.

Findings

01

Linearity emerges as a property of assembling many weak sub-models.

02

Wide neural networks can be viewed as an assembly of diverse neurons.

03

The assembly process explains the near-constant NTK in wide networks.

Abstract

Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent. These findings seem counter-intuitive since in general neural networks are highly complex models. Why does a linear structure emerge when the networks become wide? In this work, we provide a new perspective on this "transition to linearity" by considering a neural network as an assembly model recursively built from a set of sub-models corresponding to individual neurons. In this view, we show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse "weak" sub-models, none of which dominate the assembly.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models· slideslive

Taxonomy

TopicsAdvanced Memory and Neural Computing · Machine Learning and ELM · Stochastic Gradient Optimization Techniques