Bayesian Attention Belief Networks
Shujian Zhang, Xinjie Fan, Bo Chen, Mingyuan Zhou

TL;DR
This paper introduces Bayesian attention belief networks that model attention weights probabilistically, improving performance, uncertainty estimation, and robustness across multiple tasks compared to deterministic and existing stochastic attention methods.
Contribution
It presents a novel hierarchical probabilistic model for attention, enabling differentiable training and easy conversion of existing models to Bayesian attention belief networks.
Findings
Outperforms deterministic attention in accuracy and robustness.
Achieves better uncertainty estimation and domain generalization.
Demonstrates versatility across language, translation, and visual tasks.
Abstract
Attention-based neural networks have achieved state-of-the-art results on a wide range of tasks. Most such models use deterministic attention while stochastic attention is less explored due to the optimization difficulties or complicated model design. This paper introduces Bayesian attention belief networks, which construct a decoder network by modeling unnormalized attention weights with a hierarchy of gamma distributions, and an encoder network by stacking Weibull distributions with a deterministic-upward-stochastic-downward structure to approximate the posterior. The resulting auto-encoding networks can be optimized in a differentiable way with a variational lower bound. It is simple to convert any models with deterministic attention, including pretrained ones, to the proposed Bayesian attention belief networks. On a variety of language understanding tasks, we show that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
