Convolutional Deep Kernel Machines
Edward Milsom, Ben Anson, Laurence Aitchison

TL;DR
This paper introduces convolutional deep kernel machines, extending the deep kernel machine framework with convolutional structures, novel approximations, and techniques, achieving state-of-the-art results on image classification benchmarks.
Contribution
It develops convolutional deep kernel machines with new inter-domain inducing point approximation and techniques like batch normalization, improving performance on image datasets.
Findings
Achieved 99% test accuracy on MNIST
Achieved 72% test accuracy on CIFAR-100
Achieved 92.7% test accuracy on CIFAR-10
Abstract
Standard infinite-width limits of neural networks sacrifice the ability for intermediate layers to learn representations from data. Recent work (A theory of representation learning gives a deep generalisation of kernel methods, Yang et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained. Furthermore, they found that applying this modified limit to a deep Gaussian process gives a practical learning algorithm which they dubbed the deep kernel machine (DKM). However, they only considered the simplest possible setting: regression in small, fully connected networks with e.g. 10 input features. Here, we introduce convolutional deep kernel machines. This required us to develop a novel inter-domain inducing point approximation, as well as introducing and experimentally assessing a number of techniques not…
Peer Reviews
Decision·ICLR 2024 poster
* The proposed convolutional DKM model has strong performance on classification benchmarks. 92% accuracy on CIFAR10 is quite impressive if we classify DKM into the broad class of kernel methods. * The technical development is sound--the proposed model formulation and training algorithms are sensible and computationally efficient.
* The contribution seems incremental. The proposed method is a combination of DKMs and the inter-domain inducing points used in convolutional (D)GPs. * The objective for learning parametric gram matrices is regularized by a NNGP prior: $KL(N(0, G^\ell) \| N(0, K(G^{\ell-1}))$ computed from the gram matrix of the previous layer. Although this seems a sensible objective function to enable representation learning, it is unclear what the first principle is behind it. It is mentioned briefly in the
- The work is a significant contribution to the field of infinite-width NN - The extension to sparse convolutional DKM is novel - The method has good results on several standard benchmarks.
- I found the definition of $C^\ell$ in Eq. 18 very confusing and unintuitive. This should be the inducing points equivalent of the BNN convolution operation in Eq. 8. However this involves summing over all of the inducing points in the previous layer. This is unusual and isn't part of any standard convolution. - Experiments only show accuracy and NLL, and while this is fine for standard machine learning models, the benefit of the Bayesian approach is mostly in the other benefits it adds besides
In general, the elements involved in the work and the used technique framework are clear. Despite of the "a bit giant" title, the content presented in the work reflect what the title conveys, so the focus of this work is well presented.
However, for the work itself, I found a bit struggling to appreciate the (both theoretical and practical) benefits of applying the proposed method.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Advanced Neural Network Applications · Machine Learning and Data Classification
MethodsGaussian Process · Neural Tangent Kernel
