Amortising Inference and Meta-Learning Priors in Neural Networks
Tommy Rochussen, Vincent Fortuin

TL;DR
This paper proposes a novel method to learn priors for Bayesian neural networks using amortised variational inference, bridging Bayesian deep learning and meta-learning, enabling flexible priors and improved meta-learning capabilities.
Contribution
It introduces a neural process-based model that learns weight priors from datasets, allowing Bayesian neural networks to be used as generative models and to perform meta-learning with limited data.
Findings
Enables Bayesian neural networks to learn priors from data collections.
Allows meta-learning and generative modeling with Bayesian neural networks.
Supports within-task minibatching and data-starved meta-learning scenarios.
Abstract
One of the core facets of Bayesianism is in the updating of prior beliefs in light of new evidenceso how can we maintain a Bayesian approach if we have no prior beliefs in the first place? This is one of the central challenges in the field of Bayesian deep learning, where it is not clear how to represent beliefs about a prediction task by prior distributions over model parameters. Bridging the fields of Bayesian deep learning and probabilistic meta-learning, we introduce a way to a weights prior from a collection of datasets by introducing a way to perform per-dataset amortised variational inference. The model we develop can be viewed as a neural process whose latent variable is the set of weights of a BNN and whose decoder is the neural network parameterised by a sample of the latent variable itself. This unique model allows us to study the behaviour of…
Peer Reviews
Decision·ICLR 2026 Poster
- This is an interesting paper that addresses shortcomings of Bayesian deep learning in a new way, as far as I know. The complexity of the model is both a pro and con. - I like the idea of amortizing the psuedo observations of Ober and Aitchison 2021 - Figure 6 was interesting. This shows the learned prior is useful regardless of the inference method. - The experiments focus on meaningful questions, rather than the largest scale experiments
- There are a few differences with a standard NP, enough that I wonder if this is really an NP (see my questions below). - Overall, this is a fairly complicated model, which makes it difficult to understand what each component is doing. I appreciated the discussion of the objective function in the appendix, but I think some of this discussion should appear in the main text. The objective doesn’t feel well motivated as it’s written now. More discussion of how this differs from a standard NP woul
* This work introduces an interesting idea of incorporating a meta-learning prior into BNNs and establishes a connection to neural processes. * It also presents an amortized linear layer structure that enables each layer to function as a Bayesian layer with an amortized prior.
* Although the proposed method is technically sophisticated, it is unclear what specific problem it aims to address with the proposed structure. For instance, it is not evident whether the main contribution lies in emphasizing the amortized prior for BNNs and its benefits, or in investigating approximate training for BNNs with a well-chosen prior. * The training procedure for the parameters of the inference network and the prior distribution is insufficiently described. Beyond presenting the lo
The presentation of the paper (language and structure) is well done. The theoretical section is sound and nicely concise. The experimental section is comprehensive and well-executed. Presented ablation studies, particularly those evaluating the quality of approximate inference against other VI methods and the qualitative and quantitative analysis of learned priors, strongly support for the paper's claims. The authors provide a commendably clear and realistic discussion of the limitations of
- **Training Loss Justification:** The paper's core training objective, PP-AVI, deviates from standard NP-losses like NP-VI. While Appendix A.4 provides a detailed justification for this choice over NP-VI, this reasoning should be (at least partly) integrated into the main text. Furthermore, the PP-AVI 'loss' is formulated as a maximization objective rather than a loss. - **Clarification on Within-Task Minibatching:** The claim that "the ability to minibatch a forward pass over a given contex
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI) · Gaussian Processes and Bayesian Inference
