Exponential Family Attention

Kevin Christian Wibisono; Yixin Wang

arXiv:2501.16790·stat.ML·January 29, 2025

Exponential Family Attention

Kevin Christian Wibisono, Yixin Wang

PDF

Open Access 1 Repo

TL;DR

Exponential Family Attention (EFA) extends self-attention to probabilistically model complex, high-dimensional, mixed-type data by capturing dynamic interactions, outperforming existing models in various real-world datasets.

Contribution

EFA introduces a probabilistic generative model that leverages attention for dynamic, high-dimensional data, with theoretical guarantees and superior empirical performance.

Findings

01

EFA outperforms existing models in real-world data reconstruction.

02

EFA captures complex latent structures effectively.

03

Theoretical guarantees on identifiability and generalization.

Abstract

The self-attention mechanism is the backbone of the transformer neural network underlying most large language models. It can capture complex word patterns and long-range dependencies in natural language. This paper introduces exponential family attention (EFA), a probabilistic generative model that extends self-attention to handle high-dimensional sequence, spatial, or spatial-temporal data of mixed data types, including both discrete and continuous observations. The key idea of EFA is to model each observation conditional on all other existing observations, called the context, whose relevance is learned in a data-driven way via an attention-based latent factor model. In particular, unlike static latent embeddings, EFA uses the self-attention mechanism to capture dynamic interactions in the context, where the relevance of each context observations depends on other observations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yixinw-lab/efa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAttachment and Relationship Dynamics

MethodsSoftmax · Attention Is All You Need