Why bigger is not always better: on finite and infinite neural networks

Laurence Aitchison

arXiv:1910.08013·stat.ML·June 25, 2020·5 cites

Why bigger is not always better: on finite and infinite neural networks

Laurence Aitchison

PDF

Open Access 1 Video

TL;DR

This paper explores the limitations of infinite neural networks in representation learning, introduces finite and bottlenecked infinite networks, and demonstrates their potential advantages over traditional infinite models.

Contribution

It provides analytic insights into finite deep linear networks and proposes a new class of infinite networks with bottlenecks that enable representation learning.

Findings

01

Infinite Bayesian neural networks lack representation learning.

02

Finite deep linear networks show richer representation learning.

03

Bottlenecked infinite networks combine tractability with learning flexibility.

Abstract

Recent work has argued that neural networks can be understood theoretically by taking the number of channels to infinity, at which point the outputs become Gaussian process (GP) distributed. However, we note that infinite Bayesian neural networks lack a key facet of the behaviour of real neural networks: the fixed kernel, determined only by network hyperparameters, implies that they cannot do any form of representation learning. The lack of representation or equivalently kernel learning leads to less flexibility and hence worse performance, giving a potential explanation for the inferior performance of infinite networks observed in the literature (e.g. Novak et al. 2019). We give analytic results characterising the prior over representations and representation learning in finite deep linear networks. We show empirically that the representations in SOTA architectures such as ResNets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Why bigger is not always better: on finite and infinite neural networks· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Machine Learning and Data Classification

MethodsStochastic Gradient Descent · Gaussian Process