Bayesian Attention Belief Networks

Shujian Zhang; Xinjie Fan; Bo Chen; Mingyuan Zhou

arXiv:2106.05251·cs.LG·June 10, 2021·1 cites

Bayesian Attention Belief Networks

Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

PDF

Open Access 1 Video

TL;DR

This paper introduces Bayesian attention belief networks that model attention weights probabilistically, improving performance, uncertainty estimation, and robustness across multiple tasks compared to deterministic and existing stochastic attention methods.

Contribution

It presents a novel hierarchical probabilistic model for attention, enabling differentiable training and easy conversion of existing models to Bayesian attention belief networks.

Findings

01

Outperforms deterministic attention in accuracy and robustness.

02

Achieves better uncertainty estimation and domain generalization.

03

Demonstrates versatility across language, translation, and visual tasks.

Abstract

Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Bayesian Attention Belief Networks· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning