On the Importance of Strong Baselines in Bayesian Deep Learning
Jishnu Mukhoti, Pontus Stenetorp, Yarin Gal

TL;DR
This paper emphasizes the critical importance of using strong, well-trained baselines in Bayesian Deep Learning experiments, revealing that inconsistent evaluation protocols can lead to misleading conclusions about method performance.
Contribution
It uncovers a common experimental flaw where baselines are not trained to convergence, demonstrating that proper evaluation can significantly alter perceived method superiority.
Findings
Monte Carlo dropout outperforms or matches more complex methods when properly trained.
Inconsistent training protocols can misrepresent the effectiveness of Bayesian methods.
Proper benchmarking requires identical experimental setups for fair comparison.
Abstract
Like all sub-fields of machine learning Bayesian Deep Learning is driven by empirical validation of its theoretical proposals. Given the many aspects of an experiment it is always possible that minor or even major experimental flaws can slip by both authors and reviewers. One of the most popular experiments used to evaluate approximate inference techniques is the regression experiment on UCI datasets. However, in this experiment, models which have been trained to convergence have often been compared with baselines trained only for a fixed number of iterations. We find that a well-established baseline, Monte Carlo dropout, when evaluated under the same experimental settings shows significant improvements. In fact, the baseline outperforms or performs competitively with methods that claimed to be superior to the very same baseline method when they were introduced. Hence, by exposing this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Machine Learning and Data Classification
