Deep Kernel Posterior Learning under Infinite Variance Prior Weights

Jorge Lor\'ia; Anindya Bhadra

arXiv:2410.01284·stat.ML·May 5, 2025

Deep Kernel Posterior Learning under Infinite Variance Prior Weights

Jorge Lor\'ia, Anindya Bhadra

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces a novel Bayesian deep neural network model with infinite variance weights that converges to a process with stable marginals, enabling stochastic kernels and improved representation learning.

Contribution

It demonstrates that deep neural networks with elliptically distributed weights of infinite variance converge to stable processes, extending prior Gaussian process results and enabling stochastic kernel representations.

Findings

01

Stable process convergence with infinite variance weights

02

Recursive linking of random covariance kernels in deep networks

03

Enhanced computational and statistical performance in experiments

Abstract

Neal (1996) proved that infinitely wide shallow Bayesian neural networks (BNN) converge to Gaussian processes (GP), when the network weights have bounded prior variance. Cho & Saul (2009) provided a useful recursive formula for deep kernel processes for relating the covariance kernel of each layer to the layer immediately below. Moreover, they worked out the form of the layer-wise covariance kernel in an explicit manner for several common activation functions. Recent works, including Aitchison et al. (2021), have highlighted that the covariance kernels obtained in this manner are deterministic and hence, precludes any possibility of representation learning, which amounts to learning a non-degenerate posterior of a random kernel given the data. To address this, they propose adding artificial noise to the kernel to retain stochasticity, and develop deep kernel inverse Wishart processes.…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 2

Strengths

- The technical aspects of the paper appear solid, although I have not checked the results in detail. - The experiments seem to effectively support the theoretical claims.

Weaknesses

- The paper is somewhat challenging to approach, given its niche topic and highly technical content. It also feels quite text-heavy.

Reviewer 02Rating 8Confidence 4

Strengths

**Strengths:** - The proposed model is capable of representation learning. Specifically, Proposition 3 nicely shows that the feature at layer l depends on the data X, y observed in training, which is unlike many finite width models. It is nice to have this notion of representation learning formalised in this simple way. - The idea of the paper is relatively straightforward: everything is conditionally Gaussian given this scale parameter, which induces the heavy-tailed behaviour. I see this simpl

Weaknesses

**Weaknesses:** - Unless I am mistaken, "The key finding is that the conditional mutual information decays at a slower rate for smaller α" should be "The key finding is that the mutual information [which itself is computed via MCMC as an expected conditional mutual information] decays at a slower rate for smaller α." Figure 1 shows a mutual information, not conditional mutual information. **Minor:** - Theorem 1. $J_\delta(\theta)$ can be computed explicitly, and this is obvious to people famil

Reviewer 03Rating 5Confidence 2

Strengths

The article * provides a discussion of references, theoretical results, and experiments. * discusses potential benefits in prediction and uncertainty quantification. * suggests feature learning that is not possible under a Gaussian process.

Weaknesses

1. My main concern with the article is the writing and presentation, which I think need to be improved. The abstract gives a long discussion of prior works but ideally it should instead give a crisp description of the main points in the article. The lengthy discussion in the introduction comments on prior works and perceived limitations, but does not provide a sufficiently concise and clear description of the objective, motivation, and contributions of the present work. Terminology could be int

Code & Models

Repositories

loriaj/deep-alpha-kernel
noneOfficial

Videos

Deep Kernel Posterior Learning under Infinite Variance Prior Weights· slideslive

Taxonomy

TopicsFace and Expression Recognition · Machine Learning and ELM · Neural Networks and Applications