Exponential Family Attention
Kevin Christian Wibisono, Yixin Wang

TL;DR
Exponential Family Attention (EFA) extends self-attention to probabilistically model complex, high-dimensional, mixed-type data by capturing dynamic interactions, outperforming existing models in various real-world datasets.
Contribution
EFA introduces a probabilistic generative model that leverages attention for dynamic, high-dimensional data, with theoretical guarantees and superior empirical performance.
Findings
EFA outperforms existing models in real-world data reconstruction.
EFA captures complex latent structures effectively.
Theoretical guarantees on identifiability and generalization.
Abstract
The self-attention mechanism is the backbone of the transformer neural network underlying most large language models. It can capture complex word patterns and long-range dependencies in natural language. This paper introduces exponential family attention (EFA), a probabilistic generative model that extends self-attention to handle high-dimensional sequence, spatial, or spatial-temporal data of mixed data types, including both discrete and continuous observations. The key idea of EFA is to model each observation conditional on all other existing observations, called the context, whose relevance is learned in a data-driven way via an attention-based latent factor model. In particular, unlike static latent embeddings, EFA uses the self-attention mechanism to capture dynamic interactions in the context, where the relevance of each context observations depends on other observations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAttachment and Relationship Dynamics
MethodsSoftmax · Attention Is All You Need
