Convolutional Deep Kernel Machines

Edward Milsom; Ben Anson; Laurence Aitchison

arXiv:2309.09814·stat.ML·February 27, 2024

Convolutional Deep Kernel Machines

Edward Milsom, Ben Anson, Laurence Aitchison

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces convolutional deep kernel machines, extending the deep kernel machine framework with convolutional structures, novel approximations, and techniques, achieving state-of-the-art results on image classification benchmarks.

Contribution

It develops convolutional deep kernel machines with new inter-domain inducing point approximation and techniques like batch normalization, improving performance on image datasets.

Findings

01

Achieved 99% test accuracy on MNIST

02

Achieved 72% test accuracy on CIFAR-100

03

Achieved 92.7% test accuracy on CIFAR-10

Abstract

Standard infinite-width limits of neural networks sacrifice the ability for intermediate layers to learn representations from data. Recent work (A theory of representation learning gives a deep generalisation of kernel methods, Yang et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained. Furthermore, they found that applying this modified limit to a deep Gaussian process gives a practical learning algorithm which they dubbed the deep kernel machine (DKM). However, they only considered the simplest possible setting: regression in small, fully connected networks with e.g. 10 input features. Here, we introduce convolutional deep kernel machines. This required us to develop a novel inter-domain inducing point approximation, as well as introducing and experimentally assessing a number of techniques not…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

* The proposed convolutional DKM model has strong performance on classification benchmarks. 92% accuracy on CIFAR10 is quite impressive if we classify DKM into the broad class of kernel methods. * The technical development is sound--the proposed model formulation and training algorithms are sensible and computationally efficient.

Weaknesses

* The contribution seems incremental. The proposed method is a combination of DKMs and the inter-domain inducing points used in convolutional (D)GPs. * The objective for learning parametric gram matrices is regularized by a NNGP prior: $KL(N(0, G^\ell) \| N(0, K(G^{\ell-1}))$ computed from the gram matrix of the previous layer. Although this seems a sensible objective function to enable representation learning, it is unclear what the first principle is behind it. It is mentioned briefly in the

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- The work is a significant contribution to the field of infinite-width NN - The extension to sparse convolutional DKM is novel - The method has good results on several standard benchmarks.

Weaknesses

- I found the definition of $C^\ell$ in Eq. 18 very confusing and unintuitive. This should be the inducing points equivalent of the BNN convolution operation in Eq. 8. However this involves summing over all of the inducing points in the previous layer. This is unusual and isn't part of any standard convolution. - Experiments only show accuracy and NLL, and while this is fine for standard machine learning models, the benefit of the Bayesian approach is mostly in the other benefits it adds besides

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

In general, the elements involved in the work and the used technique framework are clear. Despite of the "a bit giant" title, the content presented in the work reflect what the title conveys, so the focus of this work is well presented.

Weaknesses

However, for the work itself, I found a bit struggling to appreciate the (both theoretical and practical) benefits of applying the proposed method.

Code & Models

Repositories

edwardmilsom/convdkmpaper
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Advanced Neural Network Applications · Machine Learning and Data Classification

MethodsGaussian Process · Neural Tangent Kernel