Amortising Inference and Meta-Learning Priors in Neural Networks

Tommy Rochussen; Vincent Fortuin

arXiv:2602.08782·stat.ML·February 10, 2026

Amortising Inference and Meta-Learning Priors in Neural Networks

Tommy Rochussen, Vincent Fortuin

PDF

Open Access 3 Reviews

TL;DR

This paper proposes a novel method to learn priors for Bayesian neural networks using amortised variational inference, bridging Bayesian deep learning and meta-learning, enabling flexible priors and improved meta-learning capabilities.

Contribution

It introduces a neural process-based model that learns weight priors from datasets, allowing Bayesian neural networks to be used as generative models and to perform meta-learning with limited data.

Findings

01

Enables Bayesian neural networks to learn priors from data collections.

02

Allows meta-learning and generative modeling with Bayesian neural networks.

03

Supports within-task minibatching and data-starved meta-learning scenarios.

Abstract

One of the core facets of Bayesianism is in the updating of prior beliefs in light of new evidence $-$ so how can we maintain a Bayesian approach if we have no prior beliefs in the first place? This is one of the central challenges in the field of Bayesian deep learning, where it is not clear how to represent beliefs about a prediction task by prior distributions over model parameters. Bridging the fields of Bayesian deep learning and probabilistic meta-learning, we introduce a way to $learn$ a weights prior from a collection of datasets by introducing a way to perform per-dataset amortised variational inference. The model we develop can be viewed as a neural process whose latent variable is the set of weights of a BNN and whose decoder is the neural network parameterised by a sample of the latent variable itself. This unique model allows us to study the behaviour of…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- This is an interesting paper that addresses shortcomings of Bayesian deep learning in a new way, as far as I know. The complexity of the model is both a pro and con. - I like the idea of amortizing the psuedo observations of Ober and Aitchison 2021 - Figure 6 was interesting. This shows the learned prior is useful regardless of the inference method. - The experiments focus on meaningful questions, rather than the largest scale experiments

Weaknesses

- There are a few differences with a standard NP, enough that I wonder if this is really an NP (see my questions below). - Overall, this is a fairly complicated model, which makes it difficult to understand what each component is doing. I appreciated the discussion of the objective function in the appendix, but I think some of this discussion should appear in the main text. The objective doesn’t feel well motivated as it’s written now. More discussion of how this differs from a standard NP woul

Reviewer 02Rating 4Confidence 3

Strengths

* This work introduces an interesting idea of incorporating a meta-learning prior into BNNs and establishes a connection to neural processes. * It also presents an amortized linear layer structure that enables each layer to function as a Bayesian layer with an amortized prior.

Weaknesses

* Although the proposed method is technically sophisticated, it is unclear what specific problem it aims to address with the proposed structure. For instance, it is not evident whether the main contribution lies in emphasizing the amortized prior for BNNs and its benefits, or in investigating approximate training for BNNs with a well-chosen prior. * The training procedure for the parameters of the inference network and the prior distribution is insufficiently described. Beyond presenting the lo

Reviewer 03Rating 6Confidence 4

Strengths

The presentation of the paper (language and structure) is well done. The theoretical section is sound and nicely concise. The experimental section is comprehensive and well-executed. Presented ablation studies, particularly those evaluating the quality of approximate inference against other VI methods and the qualitative and quantitative analysis of learned priors, strongly support for the paper's claims. The authors provide a commendably clear and realistic discussion of the limitations of

Weaknesses

- **Training Loss Justification:** The paper's core training objective, PP-AVI, deviates from standard NP-losses like NP-VI. While Appendix A.4 provides a detailed justification for this choice over NP-VI, this reasoning should be (at least partly) integrated into the main text. Furthermore, the PP-AVI 'loss' is formulated as a maximization objective rather than a loss. - **Clarification on Within-Task Minibatching:** The claim that "the ability to minibatch a forward pass over a given contex

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI) · Gaussian Processes and Bayesian Inference