The Limitations of Large Width in Neural Networks: A Deep Gaussian   Process Perspective

Geoff Pleiss; John P. Cunningham

arXiv:2106.06529·cs.LG·November 9, 2021·5 cites

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective

Geoff Pleiss, John P. Cunningham

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how increasing width in neural networks, through the lens of Deep Gaussian Processes, can actually hinder performance by causing models to behave more like Gaussian processes, revealing an optimal width for best results.

Contribution

It provides a theoretical and empirical analysis showing that large width can be detrimental, identifying a 'sweet spot' for width in Deep Gaussian Processes and relating findings to conventional neural networks.

Findings

01

Large width causes Deep GP models to converge to Gaussian processes.

02

Optimal width for maximum performance is around 1 or 2 units.

03

Further increasing width beyond the optimal degrades performance.

Abstract

Large width limits have been a recent focus of deep learning research: modulo computational practicalities, do wider networks outperform narrower ones? Answering this question has been challenging, as conventional networks gain representational power with width, potentially masking any negative effects. Our analysis in this paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP), a class of nonparametric hierarchical models that subsume neural nets. In doing so, we aim to understand how width affects (standard) neural networks once they have sufficient capacity for a given modeling task. Our theoretical and empirical results on Deep GP suggest that large width can be detrimental to hierarchical models. Surprisingly, we prove that even nonparametric Deep GP converge to Gaussian processes, effectively becoming shallower without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gpleiss/limits_of_large_width
pytorchOfficial

Videos

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective· slideslive

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Statistical Mechanics and Entropy · Machine Learning and Data Classification