Investigating the Robustness of Vision Transformers against Label Noise in Medical Image Classification
Bidur Khanal, Prashant Shrestha, Sanskar Amgain, Bishesh Khanal, Binod, Bhattarai, Cristian A. Linte

TL;DR
This study evaluates how Vision Transformers compare to CNNs in handling label noise in medical image classification, highlighting the importance of pretraining for robustness.
Contribution
It is the first to systematically compare ViT and CNN architectures' robustness to label noise in medical imaging, emphasizing pretraining's role.
Findings
ViT outperforms CNNs under label noise conditions.
Pretraining significantly enhances ViT's robustness.
Robustness varies with noise levels and dataset characteristics.
Abstract
Label noise in medical image classification datasets significantly hampers the training of supervised deep learning methods, undermining their generalizability. The test performance of a model tends to decrease as the label noise rate increases. Over recent years, several methods have been proposed to mitigate the impact of label noise in medical image classification and enhance the robustness of the model. Predominantly, these works have employed CNN-based architectures as the backbone of their classifiers for feature extraction. However, in recent years, Vision Transformer (ViT)-based backbones have replaced CNNs, demonstrating improved performance and a greater ability to learn more generalizable features, especially when the dataset is large. Nevertheless, no prior work has rigorously investigated how transformer-based backbones handle the impact of label noise in medical image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods
MethodsLinear Layer · Byte Pair Encoding · Dropout · Dense Connections · Label Smoothing · Adam · Vision Transformer · Attention Is All You Need · Softmax · Layer Normalization
