On Joint Regularization and Calibration in Deep Ensembles
Laurits Fredsgaard, Mikkel N. Schmidt

TL;DR
This paper explores how joint tuning of hyperparameters in deep ensembles can enhance performance and calibration, proposing a practical holdout strategy to balance data use and evaluation.
Contribution
It introduces a partially overlapping holdout method and demonstrates the benefits of joint tuning of weight decay, temperature, and early stopping in deep ensembles.
Findings
Joint tuning often improves performance and calibration.
Overlapping holdout balances evaluation and training data.
Effectiveness varies across tasks and metrics.
Abstract
Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Advanced Neural Network Applications
