If there is no underfitting, there is no Cold Posterior Effect
Yijie Zhang, Yi-Shan Wu, Luis A. Ortega, Andr\'es R. Masegosa

TL;DR
This paper demonstrates that the Cold Posterior Effect in Bayesian deep learning occurs only when the model underfits, providing a new perspective that links CPE to model underfitting rather than solely to misspecification.
Contribution
The authors theoretically establish that the Cold Posterior Effect arises only in the presence of underfitting, challenging previous views that linked CPE mainly to model misspecification.
Findings
CPE occurs only with underfitting models
No CPE when the model is well-specified and fits the data
Provides a theoretical link between underfitting and CPE
Abstract
The cold posterior effect (CPE) (Wenzel et al., 2020) in Bayesian deep learning shows that, for posteriors with a temperature , the resulting posterior predictive could have better performances than the Bayesian posterior (). As the Bayesian posterior is known to be optimal under perfect model specification, many recent works have studied the presence of CPE as a model misspecification problem, arising from the prior and/or from the likelihood function. In this work, we provide a more nuanced understanding of the CPE as we show that misspecification leads to CPE only when the resulting Bayesian posterior underfits. In fact, we theoretically show that if there is no underfitting, there is no CPE.
Peer Reviews
Decision·Submitted to ICLR 2024
1. The paper presents an elegant unification of previous explanations and observations of the CPE as being due to underfitting. It explains both the role of prior or likelihood misspecification as well as addresses the observed CPE when using data augmentations, which is really remarkable. 2. The experiments are cleverly designed and convey some great insights into the CPE. 3. The paper is well-written, and the authors do a great job of explaining ideas and results that are quite abstract. 4. Th
This might be the first time I have struggled to find weaknesses in a paper. While the usual do-more-experiments comment can always be used, I cannot think of particular experiments that would dramatically contribute to the paper. It seems to be all-round solid work.
- They address a highly significant issue, and their arguments hold profound implications. If properly substantiated, their work is poised to be regarded as a pivotal study. - Their notation is clean, and well-written.
$\bf{(Major)}$ According to the definition of CPE, as lambda increases, the test loss should decrease. Additionally, as $\lambda$ increases, the train loss always decreases irrespective of CPE. Up to this point, arguments in Theorem 2, Proposition 3 are accurate but rather self-evident. However, they have not demonstrated that the distribution, whose existence was proven, becomes a posterior distribution (which should be definable from a new likelihood or a new prior). Hence, according to th
-Interesting take on the CPE problem that might give light to new avenues on why CPE exists and how to tackle it. -Potential of good value if the argument is made more clear or presented in a better way.
-The argument is a bit unclear from my perspective. The authors argue that the problem is under-fitting which comes from misspecification. So is the problem the under-fitting or the misspecification which causes the under-fitting. -The paper seems to be stepping on previous results and works, claiming that the misspecification of the prior or the likelihood lead to underfitting, and therefore underfitting is the problem that causes CPE. Well if CPE is present when under-fitting is present then t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
MethodsCollaborative Preference Embedding
