Investigating the Robustness of Vision Transformers against Label Noise   in Medical Image Classification

Bidur Khanal; Prashant Shrestha; Sanskar Amgain; Bishesh Khanal; Binod; Bhattarai; Cristian A. Linte

arXiv:2402.16734·eess.IV·February 27, 2024·2 cites

Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification

Bidur Khanal, Prashant Shrestha, Sanskar Amgain, Bishesh Khanal, Binod, Bhattarai, Cristian A. Linte

PDF

Open Access

TL;DR

This study evaluates how Vision Transformers compare to CNNs in handling label noise in medical image classification, highlighting the importance of pretraining for robustness.

Contribution

It is the first to systematically compare ViT and CNN architectures' robustness to label noise in medical imaging, emphasizing pretraining's role.

Findings

01

ViT outperforms CNNs under label noise conditions.

02

Pretraining significantly enhances ViT's robustness.

03

Robustness varies with noise levels and dataset characteristics.

Abstract

Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods

MethodsLinear Layer · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Adam · Vision Transformer · Attention Is All You Need · Softmax · Layer Normalization