A note on Linear Bottleneck networks and their Transition to   Multilinearity

Libin Zhu; Parthe Pandit; Mikhail Belkin

arXiv:2206.15058·cs.LG·July 1, 2022

A note on Linear Bottleneck networks and their Transition to Multilinearity

Libin Zhu, Parthe Pandit, Mikhail Belkin

PDF

Open Access

TL;DR

This paper investigates how neural networks with bottleneck layers transition from linear to multilinear functions of weights, revealing that the degree of multilinearity depends on the number of bottlenecks rather than total depth.

Contribution

It introduces the concept that networks with bottleneck layers learn multilinear functions, extending the understanding of neural network behavior beyond the infinite width regime.

Findings

01

Bottleneck layers induce multilinearity in neural networks.

02

The degree of multilinearity equals the number of bottleneck layers.

03

Transition to linearity fails when the infinite width assumption is violated.

Abstract

Randomly initialized wide neural networks transition to linear functions of weights as the width grows, in a ball of radius $O (1)$ around initialization. A necessary condition for this result is that all layers of the network are wide enough, i.e., all widths tend to infinity. However, the transition to linearity breaks down when this infinite width assumption is violated. In this work we show that linear networks with a bottleneck layer learn bilinear functions of the weights, in a ball of radius $O (1)$ around initialization. In general, for $B - 1$ bottleneck layers, the network is a degree $B$ multilinear function of weights. Importantly, the degree only depends on the number of bottlenecks and not the total depth of the network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Mathematical Approximation and Integration