Kernel Deformed Exponential Families for Sparse Continuous Attention
Alexander Moreno, Supriya Nagesh, Zhenke Wu, Walter Dempsey, James M., Rehg

TL;DR
This paper introduces kernel deformed exponential families for continuous attention, enabling sparse and flexible focus on multiple data regions, with theoretical guarantees and practical effectiveness demonstrated through experiments.
Contribution
It extends continuous attention mechanisms by proposing kernel deformed exponential families, offering sparse support and comparable approximation capabilities to kernel exponential families.
Findings
Kernel deformed exponential families can attend to multiple data regions.
Theoretical existence results for these families are established.
Experiments demonstrate effective sparse attention in practice.
Abstract
Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021) extended this to use Gaussian mixture attention densities, which are a flexible class with dense support. In this paper, we extend this to two general flexible classes: kernel exponential families and our new sparse counterpart kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Experiments show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
