Bayesian Deep Convolutional Networks with Many Channels are Gaussian   Processes

Roman Novak; Lechao Xiao; Jaehoon Lee; Yasaman Bahri; Greg Yang; Jiri; Hron; Daniel A. Abolafia; Jeffrey Pennington; Jascha Sohl-Dickstein

arXiv:1810.05148·stat.ML·August 24, 2020·170 cites

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Greg Yang, Jiri, Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

PDF

Open Access

TL;DR

This paper establishes an equivalence between multi-layer convolutional neural networks and Gaussian processes, enabling Bayesian analysis and state-of-the-art results on CIFAR10 without trainable kernels.

Contribution

It derives the Gaussian process equivalence for CNNs with and without pooling, introduces a Monte Carlo method for estimation, and explores the impact of translation equivariance in the infinite channel limit.

Findings

01

GP equivalence for CNNs with/without pooling

02

State-of-the-art CIFAR10 results with GPs

03

SGD-trained CNNs can outperform their GP counterparts

Abstract

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs). This equivalence enables, for instance, test set predictions that would have resulted from a fully Bayesian, infinitely wide trained FCN to be computed without ever instantiating the FCN, but by instead evaluating the corresponding GP. In this work, we derive an analogous equivalence for multi-layer convolutional neural networks (CNNs) both with and without pooling layers, and achieve state of the art results on CIFAR10 for GPs without trainable kernels. We also introduce a Monte Carlo method to estimate the GP corresponding to a given neural network architecture, even in cases where the analytic form has too many terms to be computationally feasible. Surprisingly, in the absence of pooling layers, the GPs corresponding to CNNs with and without weight sharing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsMax Pooling · Convolution · Fully Convolutional Network · Stochastic Gradient Descent