Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification
Kaizheng Wang, Fabio Cuzzolin, Keivan Shariatmadar, David Moens, Hans Hallez

TL;DR
This paper introduces a credal wrapper method for model averaging in Bayesian neural networks and deep ensembles, enhancing uncertainty estimation and calibration in classification, especially for out-of-distribution detection.
Contribution
The paper proposes a novel credal wrapper approach that constructs probability intervals from model ensembles, improving uncertainty quantification in classification tasks.
Findings
Outperforms BNN and DE baselines in uncertainty estimation.
Achieves lower expected calibration error on corrupted data.
Effective across various datasets and neural network architectures.
Abstract
This paper presents an innovative approach, called credal wrapper, to formulating a credal set representation of model averaging for Bayesian neural networks (BNNs) and deep ensembles (DEs), capable of improving uncertainty estimation in classification tasks. Given a finite collection of single predictive distributions derived from BNNs or DEs, the proposed credal wrapper approach extracts an upper and a lower probability bound per class, acknowledging the epistemic uncertainty due to the availability of a limited amount of distributions. Such probability intervals over classes can be mapped on a convex set of probabilities (a credal set) from which, in turn, a unique prediction can be obtained using a transformation called intersection probability transformation. In this article, we conduct extensive experiments on several out-of-distribution (OOD) detection benchmarks, encompassing…
Peer Reviews
Decision·ICLR 2025 Spotlight
1. The idea of using probability intervals to reflect the epistemic uncertainty is appealing. 2. The intersection probability transformation has a deterministic solution, and thus it is as efficient as other alternatives in model averaging. 3. Experimental results are strong and comprehensive.
1. Except for table results, I expect a more direct demonstration of why intersection probability is better than naive averaging. For example, can you visualize the uncertainty on simple regression tasks? 2. Computing epistemic uncertainty using probability intervals is costly. I expect discussions on the extra computational cost compared with entropy.
- The proposed method is simple yet effective, as being demonstrated in the widely-conducted experiments. - The proposed approximation algorithm, PIA, effectively approximates the original optimization problem and ultimately makes the idea practically tractable. - The proposed method is easy to "wrap" around various methods as long as they provide multiple samples of model predictions. - Experimental details are well documented.
Overall speaking, the paper is a bit tricky to follow as lots of details are densely presented. It will be clearer if some parts (e.g., detailed experiment setups, ablation studies on # of predictive samples, etc.) could be moved to the appendix. Some parts of the appendix, e.g., A.6 ablation studies on PIA parameter $J$, feel more important to me compared to some of the ablation studies listed in the main body. Besides, although I am unaware of existing similar works, it would be beneficial if
- The paper is really well-written and easy to follow. - The proposed idea is really simple and is highly applicable in many settings. - The experiments are nicely structure and follows the evaluation protocol of the literature. - The authors clearly highlight the approach's limitations stemming from its computational complexity.
- **Related Work:** The approach is presented as innovative but lacks clear placement within existing literature. While there is significant discussion on credal sets in ML, the authors only briefly touch on their application in deep learning, mentioning computational complexity as a key challenge. After a short research I noticed some papers focusing on credal sets in deep learning [1, 2] and was wondering why these are not discussed in the related work. More importantly, how is this paper impr
Videos
Taxonomy
TopicsFault Detection and Control Systems · Anomaly Detection Techniques and Applications
MethodsDeep Ensembles · Sparse Evolutionary Training · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Pointwise Convolution · Dropout · Dense Connections · (FiLe@Against@Claim)How do I file a claim against Expedia? · Average Pooling · Sigmoid Activation
