Improving the repeatability of deep learning models with Monte Carlo   dropout

Andreanne Lemay; Katharina Hoebel; Christopher P. Bridge; Brian; Befano; Silvia De Sanjos\'e; Diden Egemen; Ana Cecilia Rodriguez; Mark; Schiffman; John Peter Campbell; Jayashree Kalpathy-Cramer

arXiv:2202.07562·eess.IV·February 16, 2022·5 cites

Improving the repeatability of deep learning models with Monte Carlo dropout

Andreanne Lemay, Katharina Hoebel, Christopher P. Bridge, Brian, Befano, Silvia De Sanjos\'e, Diden Egemen, Ana Cecilia Rodriguez, Mark, Schiffman, John Peter Campbell, Jayashree Kalpathy-Cramer

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that using Monte Carlo dropout during testing enhances the repeatability, calibration, and sometimes accuracy of deep learning models across various medical image classification tasks, supporting more reliable clinical deployment.

Contribution

It provides a comprehensive evaluation of Monte Carlo dropout's effect on model repeatability and calibration across multiple medical imaging tasks and architectures.

Findings

01

Monte Carlo dropout significantly improves repeatability in all tasks.

02

Repeatability gains plateau after about 20 Monte Carlo iterations.

03

Monte Carlo predictions enhance model calibration and sometimes accuracy.

Abstract

The integration of artificial intelligence into clinical workflows requires reliable and robust models. Repeatability is a key attribute of model robustness. Repeatable models output predictions with low variation during independent tests carried out under similar conditions. During model development and evaluation, much attention is given to classification performance while model repeatability is rarely assessed, leading to the development of models that are unusable in clinical practice. In this work, we evaluate the repeatability of four model types (binary classification, multi-class classification, ordinal classification, and regression) on images that were acquired from the same patient during the same visit. We study the performance of binary, multi-class, ordinal, and regression models on four medical image classification tasks from public and private datasets: knee…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andreanne-lemay/gray_zone_assessment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · AI in cancer detection

MethodsConcatenated Skip Connection · Batch Normalization · Max Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · Global Average Pooling · Residual Connection · Average Pooling · 1x1 Convolution · Dense Block · Softmax