Data-Efficient Vision Transformers for Multi-Label Disease   Classification on Chest Radiographs

Finn Behrendt; Debayan Bhattacharya; Julia Kr\"uger; Roland Opfer,; Alexander Schlaefer

arXiv:2208.08166·cs.CV·August 18, 2022

Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs

Finn Behrendt, Debayan Bhattacharya, Julia Kr\"uger, Roland Opfer,, Alexander Schlaefer

PDF

TL;DR

This paper compares the performance of Vision Transformers and CNNs for multi-label disease classification on chest radiographs, highlighting data efficiency and the advantages of DeiT variants with larger datasets.

Contribution

It systematically evaluates ViTs and CNNs on chest X-ray classification, demonstrating DeiT's superior data efficiency and performance with larger datasets.

Findings

01

ViTs perform comparably to CNNs on small datasets.

02

DeiT variants outperform ViTs with larger datasets.

03

Data efficiency of ViTs improves with dataset size.

Abstract

Radiographs are a versatile diagnostic tool for the detection and assessment of pathologies, for treatment planning or for navigation and localization purposes in clinical interventions. However, their interpretation and assessment by radiologists can be tedious and error-prone. Thus, a wide variety of deep learning methods have been proposed to support radiologists interpreting radiographs. Mostly, these approaches rely on convolutional neural networks (CNN) to extract features from images. Especially for the multi-label classification of pathologies on chest radiographs (Chest X-Rays, CXR), CNNs have proven to be well suited. On the Contrary, Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images and interpretable local saliency maps which could add value to clinical interventions. ViTs do not rely on convolutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.