Deep Convolutional Networks as shallow Gaussian Processes
Adri\`a Garriga-Alonso, Carl Edward Rasmussen, Laurence Aitchison

TL;DR
This paper demonstrates that deep convolutional neural networks can be represented as Gaussian processes in the infinite filter limit, enabling efficient kernel computation and achieving state-of-the-art results on MNIST.
Contribution
It extends Gaussian process equivalence to residual CNNs, providing an exact kernel with minimal parameters and efficient evaluation, and reports new performance benchmarks.
Findings
Kernel evaluation cost similar to a single CNN pass
Achieved 0.84% error on MNIST with GP-based kernel
Exact kernel computation for residual CNNs
Abstract
We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
