Transformer Uncertainty Estimation with Hierarchical Stochastic Attention
Jiahuan Pei, Cheng Wang, Gy\"orgy Szarvas

TL;DR
This paper introduces a hierarchical stochastic attention mechanism for transformers that enables uncertainty estimation without sacrificing predictive accuracy, validated on text classification tasks with in-domain and out-of-domain data.
Contribution
It proposes a novel hierarchical stochastic self-attention method that allows transformers to estimate uncertainty while maintaining high predictive performance.
Findings
Achieves the best uncertainty-performance trade-off among compared methods.
Maintains or improves predictive accuracy on in-domain datasets.
Performs comparably to Monte Carlo dropout and ensemble methods on out-of-domain uncertainty estimation.
Abstract
Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance. This is achieved by learning a hierarchical stochastic self-attention that attends to values and a set of learnable centroids, respectively. Then new attention heads are formed with a mixture of sampled centroids using the Gumbel-Softmax trick. We theoretically show that the self-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
MethodsDropout · Monte Carlo Dropout
