Distribution Transformers: Fast Approximate Bayesian Inference With On-The-Fly Prior Adaptation
George Whittle, Juliusz Ziomek, Jacob Rawling, Maike A. Osborne

TL;DR
The paper introduces the Distribution Transformer, a neural architecture that performs fast, approximate Bayesian inference by transforming prior distributions into posterior distributions using attention mechanisms, suitable for real-time applications.
Contribution
It presents a novel architecture that learns to map priors to posteriors efficiently, reducing computation time significantly while maintaining high inference quality.
Findings
Reduces inference time from minutes to milliseconds.
Achieves comparable or superior log-likelihood performance to existing methods.
Demonstrates effectiveness across diverse tasks like sequential inference and quantum parameter estimation.
Abstract
While Bayesian inference provides a principled framework for reasoning under uncertainty, its widespread adoption is limited by the intractability of exact posterior computation, necessitating the use of approximate inference. However, existing methods are often computationally expensive, or demand costly retraining when priors change, limiting their utility, particularly in sequential inference problems such as real-time sensor fusion. To address these challenges, we introduce the Distribution Transformer -- a novel architecture that can learn arbitrary distribution-to-distribution mappings. Our method can be trained to map a prior to the corresponding posterior, conditioned on some dataset -- thus performing approximate Bayesian inference. Our novel architecture represents a prior distribution as a (universally-approximating) Gaussian Mixture Model (GMM), and transforms it into a GMM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Machine Learning and Algorithms
MethodsAttention Is All You Need · Label Smoothing · Layer Normalization · Linear Layer · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam
