If there is no underfitting, there is no Cold Posterior Effect

Yijie Zhang; Yi-Shan Wu; Luis A. Ortega; Andr\'es R. Masegosa

arXiv:2310.01189·stat.ML·October 3, 2023

If there is no underfitting, there is no Cold Posterior Effect

Yijie Zhang, Yi-Shan Wu, Luis A. Ortega, Andr\'es R. Masegosa

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates that the Cold Posterior Effect in Bayesian deep learning occurs only when the model underfits, providing a new perspective that links CPE to model underfitting rather than solely to misspecification.

Contribution

The authors theoretically establish that the Cold Posterior Effect arises only in the presence of underfitting, challenging previous views that linked CPE mainly to model misspecification.

Findings

01

CPE occurs only with underfitting models

02

No CPE when the model is well-specified and fits the data

03

Provides a theoretical link between underfitting and CPE

Abstract

The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature $T < 1$ , the resulting posterior predictive could have better performances than the Bayesian posterior ( $T = 1$ ). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood function. In this work, we provide a more nuanced understanding of the CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE.

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

1. The paper presents an elegant unification of previous explanations and observations of the CPE as being due to underfitting. It explains both the role of prior or likelihood misspecification as well as addresses the observed CPE when using data augmentations, which is really remarkable. 2. The experiments are cleverly designed and convey some great insights into the CPE. 3. The paper is well-written, and the authors do a great job of explaining ideas and results that are quite abstract. 4. Th

Weaknesses

This might be the first time I have struggled to find weaknesses in a paper. While the usual do-more-experiments comment can always be used, I cannot think of particular experiments that would dramatically contribute to the paper. It seems to be all-round solid work.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- They address a highly significant issue, and their arguments hold profound implications. If properly substantiated, their work is poised to be regarded as a pivotal study. - Their notation is clean, and well-written.

Weaknesses

$\bf{(Major)}$ According to the definition of CPE, as lambda increases, the test loss should decrease. Additionally, as $\lambda$ increases, the train loss always decreases irrespective of CPE. Up to this point, arguments in Theorem 2, Proposition 3 are accurate but rather self-evident. However, they have not demonstrated that the distribution, whose existence was proven, becomes a posterior distribution (which should be definable from a new likelihood or a new prior). Hence, according to th

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

-Interesting take on the CPE problem that might give light to new avenues on why CPE exists and how to tackle it. -Potential of good value if the argument is made more clear or presented in a better way.

Weaknesses

-The argument is a bit unclear from my perspective. The authors argue that the problem is under-fitting which comes from misspecification. So is the problem the under-fitting or the misspecification which causes the under-fitting. -The paper seems to be stepping on previous results and works, claiming that the misspecification of the prior or the likelihood lead to underfitting, and therefore underfitting is the problem that causes CPE. Well if CPE is present when under-fitting is present then t

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference

MethodsCollaborative Preference Embedding