Improving the repeatability of deep learning models with Monte Carlo dropout
Andreanne Lemay, Katharina Hoebel, Christopher P. Bridge, Brian, Befano, Silvia De Sanjos\'e, Diden Egemen, Ana Cecilia Rodriguez, Mark, Schiffman, John Peter Campbell, Jayashree Kalpathy-Cramer

TL;DR
This study demonstrates that using Monte Carlo dropout during testing enhances the repeatability, calibration, and sometimes accuracy of deep learning models across various medical image classification tasks, supporting more reliable clinical deployment.
Contribution
It provides a comprehensive evaluation of Monte Carlo dropout's effect on model repeatability and calibration across multiple medical imaging tasks and architectures.
Findings
Monte Carlo dropout significantly improves repeatability in all tasks.
Repeatability gains plateau after about 20 Monte Carlo iterations.
Monte Carlo predictions enhance model calibration and sometimes accuracy.
Abstract
The integration of artificial intelligence into clinical workflows requires reliable and robust models. Repeatability is a key attribute of model robustness. Repeatable models output predictions with low variation during independent tests carried out under similar conditions. During model development and evaluation, much attention is given to classification performance while model repeatability is rarely assessed, leading to the development of models that are unusable in clinical practice. In this work, we evaluate the repeatability of four model types (binary classification, multi-class classification, ordinal classification, and regression) on images that were acquired from the same patient during the same visit. We study the performance of binary, multi-class, ordinal, and regression models on four medical image classification tasks from public and private datasets: knee…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · AI in cancer detection
MethodsConcatenated Skip Connection · Batch Normalization · Max Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Global Average Pooling · Residual Connection · Average Pooling · 1x1 Convolution · Dense Block · Softmax
