Pathologies in priors and inference for Bayesian transformers

Tristan Cinquin; Alexander Immer; Max Horn; Vincent Fortuin

arXiv:2110.04020·cs.LG·October 18, 2021

Pathologies in priors and inference for Bayesian transformers

Tristan Cinquin, Alexander Immer, Max Horn, Vincent Fortuin

PDF

Open Access

TL;DR

This paper investigates the challenges of applying Bayesian inference to transformers, identifies issues with weight-space inference and priors, and proposes a novel function-space variational method using Dirichlet distributions to improve uncertainty estimation.

Contribution

It introduces a new function-space variational inference method for transformers based on implicit Dirichlet reparameterization, addressing prior and inference issues.

Findings

01

Weight-space inference in transformers performs poorly regardless of the approximate posterior.

02

The prior distribution significantly affects Bayesian transformer performance.

03

The proposed Dirichlet-based method performs competitively with baseline approaches.

Abstract

In recent years, the transformer has established itself as a workhorse in many applications ranging from natural language processing to reinforcement learning. Similarly, Bayesian deep learning has become the gold-standard for uncertainty estimation in safety-critical applications, where robustness and calibration are crucial. Surprisingly, no successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist. In this work, we study this curiously underpopulated area of Bayesian transformers. We find that weight-space inference in transformers does not work well, regardless of the approximate posterior. We also find that the prior is at least partially at fault, but that it is very hard to find well-specified weight priors for these models. We hypothesize that these problems stem from the complexity of obtaining a meaningful mapping from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Gaussian Processes and Bayesian Inference

MethodsVariational Inference