Probabilistic Transformers
Javier R. Movellan, Prasad Gabbur

TL;DR
This paper presents a probabilistic interpretation of Transformers as maximum posterior estimators for Gaussian mixture models, opening avenues for probabilistic extensions and deeper understanding.
Contribution
It introduces a novel probabilistic perspective on Transformers, linking them to Gaussian mixture models and suggesting potential extensions.
Findings
Transformers can be viewed as maximum posterior probability estimators for Gaussian mixtures.
This perspective enables probabilistic extensions of Transformer models.
The approach bridges deep learning and probabilistic modeling.
Abstract
We show that Transformers are Maximum Posterior Probability estimators for Mixtures of Gaussian Models. This brings a probabilistic point of view to Transformers and suggests extensions to other probabilistic cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Bayesian Modeling and Causal Inference
