Transformer Uncertainty Estimation with Hierarchical Stochastic   Attention

Jiahuan Pei; Cheng Wang; Gy\"orgy Szarvas

arXiv:2112.13776·cs.CL·December 28, 2021

Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

Jiahuan Pei, Cheng Wang, Gy\"orgy Szarvas

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a hierarchical stochastic attention mechanism for transformers that enables uncertainty estimation without sacrificing predictive accuracy, validated on text classification tasks with in-domain and out-of-domain data.

Contribution

It proposes a novel hierarchical stochastic self-attention method that allows transformers to estimate uncertainty while maintaining high predictive performance.

Findings

01

Achieves the best uncertainty-performance trade-off among compared methods.

02

Maintains or improves predictive accuracy on in-domain datasets.

03

Performs comparably to Monte Carlo dropout and ensemble methods on out-of-domain uncertainty estimation.

Abstract

Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance. This is achieved by learning a hierarchical stochastic self-attention that attends to values and a set of learnable centroids, respectively. Then new attention heads are formed with a mixture of sampled centroids using the Gumbel-Softmax trick. We theoretically show that the self-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amzn/sto-transformer
pytorchOfficial

Videos

Transformer Uncertainty Estimation with Hierarchical Stochastic Attention· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)

MethodsDropout · Monte Carlo Dropout