CheXternal: Generalization of Deep Learning Models for Chest X-ray   Interpretation to Photos of Chest X-rays and External Clinical Settings

Pranav Rajpurkar; Anirudh Joshi; Anuj Pareek; Andrew Y. Ng; Matthew P.; Lungren

arXiv:2102.08660·eess.IV·February 23, 2021

CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings

Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Andrew Y. Ng, Matthew P., Lungren

PDF

1 Repo

TL;DR

This study evaluates the robustness of deep learning models for chest X-ray interpretation when applied to photos and external datasets, revealing variability in generalization performance compared to radiologists.

Contribution

It provides the first systematic assessment of multiple models' generalization to photos and external data without fine-tuning, highlighting factors affecting robustness.

Findings

01

Some models perform comparably to radiologists under distribution shifts.

02

All models' performance drops on photos, but some outperform radiologists on external datasets.

03

Model robustness varies significantly depending on training and dataset characteristics.

Abstract

Recent advances in training deep learning models have demonstrated the potential to provide accurate chest X-ray interpretation and increase access to radiology expertise. However, poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation. In this study, we measured the diagnostic performance for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to test datasets without further tuning. We found that (1) on photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance, but only 3 performed significantly worse than radiologists on average, and (2) on the external set, none of the models performed statistically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rajpurkarlab/chexpert-test-set-labels
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.