A note on Linear Bottleneck networks and their Transition to Multilinearity
Libin Zhu, Parthe Pandit, Mikhail Belkin

TL;DR
This paper investigates how neural networks with bottleneck layers transition from linear to multilinear functions of weights, revealing that the degree of multilinearity depends on the number of bottlenecks rather than total depth.
Contribution
It introduces the concept that networks with bottleneck layers learn multilinear functions, extending the understanding of neural network behavior beyond the infinite width regime.
Findings
Bottleneck layers induce multilinearity in neural networks.
The degree of multilinearity equals the number of bottleneck layers.
Transition to linearity fails when the infinite width assumption is violated.
Abstract
Randomly initialized wide neural networks transition to linear functions of weights as the width grows, in a ball of radius around initialization. A necessary condition for this result is that all layers of the network are wide enough, i.e., all widths tend to infinity. However, the transition to linearity breaks down when this infinite width assumption is violated. In this work we show that linear networks with a bottleneck layer learn bilinear functions of the weights, in a ball of radius around initialization. In general, for bottleneck layers, the network is a degree multilinear function of weights. Importantly, the degree only depends on the number of bottlenecks and not the total depth of the network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Mathematical Approximation and Integration
