Bayesian Attention Modules

Xinjie Fan; Shujian Zhang; Bo Chen; Mingyuan Zhou

arXiv:2010.10604·stat.ML·October 22, 2020·1 cites

Bayesian Attention Modules

Xinjie Fan, Shujian Zhang, Bo Chen, Mingyuan Zhou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a scalable, Bayesian stochastic attention module that improves performance and interpretability across various neural network applications by addressing optimization challenges of stochastic attention.

Contribution

It proposes a simple, differentiable, Bayesian stochastic attention mechanism using simplex-constrained distributions, applicable to multiple domains.

Findings

01

Consistent performance improvements over baselines

02

Effective in diverse tasks like graph classification and VQA

03

Enhances interpretability of attention models

Abstract

Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability. Most current models use deterministic attention modules due to their simplicity and ease of optimization. Stochastic counterparts, on the other hand, are less popular despite their potential benefits. The main reason is that stochastic attention often introduces optimization issues or requires significant model changes. In this paper, we propose a scalable stochastic version of attention that is easy to implement and optimize. We construct simplex-constrained attention distributions by normalizing reparameterizable distributions, making the training process differentiable. We learn their parameters in a Bayesian framework where a data-dependent prior is introduced for regularization. We apply the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhougroup/BAM
pytorchOfficial

Videos

Bayesian Attention Modules· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications