Kernel Deformed Exponential Families for Sparse Continuous Attention

Alexander Moreno; Supriya Nagesh; Zhenke Wu; Walter Dempsey; James M.; Rehg

arXiv:2111.01222·cs.LG·November 16, 2021

Kernel Deformed Exponential Families for Sparse Continuous Attention

Alexander Moreno, Supriya Nagesh, Zhenke Wu, Walter Dempsey, James M., Rehg

PDF

Open Access

TL;DR

This paper introduces kernel deformed exponential families for continuous attention, enabling sparse and flexible focus on multiple data regions, with theoretical guarantees and practical effectiveness demonstrated through experiments.

Contribution

It extends continuous attention mechanisms by proposing kernel deformed exponential families, offering sparse support and comparable approximation capabilities to kernel exponential families.

Findings

01

Kernel deformed exponential families can attend to multiple data regions.

02

Theoretical existence results for these families are established.

03

Experiments demonstrate effective sparse attention in practice.

Abstract

Attention mechanisms take an expectation of a data representation with respect to probability weights. This creates summary statistics that focus on important features. Recently, (Martins et al. 2020, 2021) proposed continuous attention mechanisms, focusing on unimodal attention densities from the exponential and deformed exponential families: the latter has sparse support. (Farinhas et al. 2021) extended this to use Gaussian mixture attention densities, which are a flexible class with dense support. In this paper, we extend this to two general flexible classes: kernel exponential families and our new sparse counterpart kernel deformed exponential families. Theoretically, we show new existence results for both kernel exponential and deformed exponential families, and that the deformed case has similar approximation capabilities to kernel exponential families. Experiments show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare