Distribution Transformers: Fast Approximate Bayesian Inference With On-The-Fly Prior Adaptation

George Whittle; Juliusz Ziomek; Jacob Rawling; Maike A. Osborne

arXiv:2502.02463·stat.ML·May 19, 2026

Distribution Transformers: Fast Approximate Bayesian Inference With On-The-Fly Prior Adaptation

George Whittle, Juliusz Ziomek, Jacob Rawling, Maike A. Osborne

PDF

TL;DR

The paper introduces the Distribution Transformer, a neural architecture that performs fast, approximate Bayesian inference by transforming prior distributions into posterior distributions using attention mechanisms, suitable for real-time applications.

Contribution

It presents a novel architecture that learns to map priors to posteriors efficiently, reducing computation time significantly while maintaining high inference quality.

Findings

01

Reduces inference time from minutes to milliseconds.

02

Achieves comparable or superior log-likelihood performance to existing methods.

03

Demonstrates effectiveness across diverse tasks like sequential inference and quantum parameter estimation.

Abstract

While Bayesian inference provides a principled framework for reasoning under uncertainty, its widespread adoption is limited by the intractability of exact posterior computation, necessitating the use of approximate inference. However, existing methods are often computationally expensive, or demand costly retraining when priors change, limiting their utility, particularly in sequential inference problems such as real-time sensor fusion. To address these challenges, we introduce the Distribution Transformer -- a novel architecture that can learn arbitrary distribution-to-distribution mappings. Our method can be trained to map a prior to the corresponding posterior, conditioned on some dataset -- thus performing approximate Bayesian inference. Our novel architecture represents a prior distribution as a (universally-approximating) Gaussian Mixture Model (GMM), and transforms it into a GMM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Machine Learning and Algorithms

MethodsAttention Is All You Need · Label Smoothing · Layer Normalization · Linear Layer · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam