Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep   Learning under Distribution Shift

Florian Seligmann; Philipp Becker; Michael Volpp; Gerhard Neumann

arXiv:2306.12306·cs.LG·October 26, 2023·2 cites

Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift

Florian Seligmann, Philipp Becker, Michael Volpp, Gerhard Neumann

PDF

Open Access 1 Video

TL;DR

This paper systematically evaluates Bayesian deep learning methods on large-scale, real-world datasets with distribution shifts, focusing on calibration, generalization, and the effectiveness of ensembling and fine-tuning large models.

Contribution

It provides the first large-scale, systematic comparison of BDL methods on diverse tasks, including fine-tuning large pre-trained models and extending ensembles to multiple modes.

Findings

01

Ensembling improves generalization and calibration significantly.

02

Variational inference methods outperform others in accuracy during fine-tuning.

03

SWAG achieves the best calibration among approximate inference algorithms.

Abstract

Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or under-confident, providing further insight into the behavior of the methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning