A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors
Olivier Laurent, Emanuel Aldea, Gianni Franchi

TL;DR
This paper conducts a large-scale exploration of Bayesian Neural Network posteriors in real-world vision tasks, analyzing symmetries, modes, and visualization methods to improve understanding and uncertainty quantification.
Contribution
It introduces a comprehensive analysis of weight-space symmetries in BNNs, linking them to posterior quality and uncertainty, and provides a large-scale dataset for further research.
Findings
Weight-space symmetries significantly affect posterior interpretation.
Permutation symmetries duplicate modes, impacting posterior analysis.
L2 regularization relates to scaling symmetries, challenging previous views.
Abstract
The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the…
Peer Reviews
Decision·ICLR 2024 poster
1. originality in symmetry analysis: the paper's exploration of weight-space symmetries, particularly permutation and scaling symmetries, in deep bayesian neural networks is highly original. it builds upon existing studies of dnn loss landscapes, like those by li et al. (2018) and fort & jastrzebski (2019), and extends these concepts to the bayesian context. the distinction between permutation and scaling symmetries and their unique impacts on the posterior is a novel contribution (section 1).
1. limited scope in empirical validation: while the dataset released is extensive, the paper's empirical validation primarily focuses on vision tasks and specific architectures like resnet-18. this raises questions about the generalizability of the findings across different types of neural network architectures and tasks. expanding the empirical validation to include a broader range of architectures would strengthen the findings. 1. need for further exploration of symmetries in training: the pa
1. I find the mathematical treatment of the scaling and permutation symmetries, as well as their contribution to the posterior, quite exciting. It may not be overly novel, but the exposition is interesting and inspiring. 2. The empirical analysis is exciting. The study of the different posterior approximations provides a good overview of the effects of the approximations, and it feels like there's even more to be gained from Table 1 than what the authors discuss. Furthermore, section 5 appears q
1. The paper seems to lack a specific focus. While the theoretical and experimental parts of the paper are both exciting, they seem largely disconnected. The developed mathematical formalism does not appear to be used for anything, and it is even unclear what these definitions buy us in terms of understanding BNN posteriors. The experiments in section 4 do not appear to consider symmetries at all, and whereas those of section 5 do, the link to the definitions in section 3 is unclear. To me, the
1. Connecting Bayesian neural networks with the recently obtained insights into the symmetries of the loss landscape of standard neural networks is a timely contribution and highly relevant to the field. The work is very well-written and mostly easy to follow. 2. The experiments are impressive and on a very large-scale, which is often missing in the BNN literature. I especially appreciate the different datasets investigated in this work. I also find the separation into single and multi-mode app
1. I think the first part of the paper regarding the symmetries is a bit extensive and insights from it are somewhat limited. To the best of my knowledge, it has been known for a while that the BNN posterior effectively consists of a mixture over such symmetries, so I don’t see much novelty in equation (8). While it sets up the subsequent discussion very nicely, I wouldn’t advertise this as one of the core contributions of this work. The in-depth study of the role of the scaling symmetry on the
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
