Estimating Probability Densities with Transformer and Denoising Diffusion
Henry W. Leung, Jo Bovy, Joshua S. Speagle

TL;DR
This paper introduces a novel Transformer-based probabilistic model with a denoising diffusion component that effectively estimates complex probability densities for high-dimensional data, demonstrated on astronomical datasets.
Contribution
It presents a new method combining Transformers and denoising diffusion to produce flexible, high-dimensional probability density estimates, addressing a gap in regression models.
Findings
Accurately estimates probability densities for high-dimensional data.
Conditionally models output distributions based on various input combinations.
Successfully applied to astronomical data for inference tasks.
Abstract
Transformers are often the go-to architecture to build foundation models that ingest a large amount of training data. But these models do not estimate the probability density distribution when trained on regression problems, yet obtaining full probabilistic outputs is crucial to many fields of science, where the probability distribution of the answer can be non-Gaussian and multimodal. In this work, we demonstrate that training a probabilistic model using a denoising diffusion head on top of the Transformer provides reasonable probability density estimation even for high-dimensional inputs. The combined Transformer+Denoising Diffusion model allows conditioning the output probability density on arbitrary combinations of inputs and it is thus a highly flexible density function emulator of all possible input/output combinations. We illustrate our Transformer+Denoising Diffusion model by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Diffusion · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention
