On Uniform, Bayesian, and PAC-Bayesian Deep Ensembles

Nick Hauptvogel; Christian Igel

arXiv:2406.05469·cs.LG·January 7, 2025

On Uniform, Bayesian, and PAC-Bayesian Deep Ensembles

Nick Hauptvogel, Christian Igel

PDF

Open Access 4 Reviews

TL;DR

This paper compares different ensemble methods for deep neural networks, showing that PAC-Bayesian weighted ensembles optimized with tandem loss outperform Bayesian and uniform ensembles in generalization and robustness.

Contribution

It introduces a PAC-Bayesian weighting method using tandem loss that improves ensemble generalization by accounting for model correlations, outperforming traditional Bayesian ensembles.

Findings

01

PAC-Bayesian weighted ensembles outperform Bayesian ensembles.

02

Tandem loss optimization enhances robustness against correlated models.

03

State-of-the-art Bayesian ensembles do not surpass simple uniform deep ensembles.

Abstract

It is common practice to combine deep neural networks into ensembles. These deep ensembles can benefit from the cancellation of errors effect: Errors by ensemble members may average out, leading to better generalization performance than each individual network. Bayesian neural networks learn a posterior distribution over model parameters, and sampling and weighting networks according to this posterior yields an ensemble model referred to as a Bayes ensemble. This study reviews the argument that neither the sampling nor the weighting in Bayes ensembles are particularly well suited for increasing generalization performance, as they do not support the cancellation of errors effect. In contrast, we show that a weighted average of models, where the weights are optimized by minimizing a second-order PAC-Bayesian generalization bound, can improve generalization. It is crucial that the…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 3

Strengths

1. Overall, this paper is well-written and the main hypotheses or claims are clearly presented. 2. Experimental results support these hypotheses.

Weaknesses

1. Theoretical results are lacking to support these hypotheses or claims. 2. The authors have not proposed a new method or theory, resulting in little contribution.

Reviewer 02Rating 3Confidence 3

Strengths

- Studies deep ensembles which is an important and interesting problem - Brings together different types of ensembles studied in the literature together in a unified framework. - Includes empirical results supporting the theoretical claims

Weaknesses

* Using the Bernstein-von Mises theorem theorem to show that a Bayes ensemble corresponds to a single model and hence cannot capture the cancellation of errors effect is interesting. But as mentioned by the paper, this has already been shown by the Masegosa paper already. * Not clear what this paper is adding over the Masegosa paper * This work claims that “Our results on four datasets show that complex Bayesian approximate inference methods can often be surpassed by more efficient simple deep e

Reviewer 03Rating 5Confidence 3

Strengths

* The paper is well written. * The authors have generally performed a thorough literature review. * The method and general area of research is of high interest in the community (methods for aiding model generalisation, robustness and uncertainty quantification).

Weaknesses

I will outline the weaknesses here and elaborate on them in the questions. * Lack of novelty. This is the main reason for the low score. * Lacking experimental evaluation metrics. Given the lack of novelty, I would strongly argue that experimental evaluation should be much larger for this work to be accepted.

Reviewer 04Rating 3Confidence 3

Strengths

1. This work focuses on an interesting problem of understanding the mechanism of ensembles. 2. It carries out extensive reviews of theoretical works regarding ensembles and provides a theoretical analysis of why Bayesian ensembles fail to select better weights compared with uniform ensembles. 3. The optimization of weights through the second-order bound and tandem loss is an interesting approach.

Weaknesses

1. The derivation of eq. 3 is missing and the authors do not provide much insight into this equation. I suggest that the presentation and motivation can benefit from elaborating more on how this upper bound should be understood and what the implications could be. 2. The presentation does not properly highlight the contribution of this work. Section 3 summarizes many existing works and approaches while leaving its own contribution unclear. This section should be reorganized for a better presentat

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications

MethodsDeep Ensembles