Self-Ensembling Vision Transformer (SEViT) for Robust Medical Image Classification
Faris Almalik, Mohammad Yaqub, Karthik Nandakumar

TL;DR
This paper introduces SEViT, a self-ensembling approach that enhances the robustness of Vision Transformers in medical image classification against adversarial attacks by leveraging intermediate feature representations.
Contribution
The paper proposes a novel self-ensembling method for ViT that improves adversarial robustness and enables detection of adversarial samples in medical imaging tasks.
Findings
SEViT significantly reduces vulnerability to adversarial attacks.
The method effectively detects adversarial samples using prediction consistency.
Experiments on chest X-ray and fundoscopy data validate robustness improvements.
Abstract
Vision Transformers (ViT) are competing to replace Convolutional Neural Networks (CNN) for various computer vision tasks in medical imaging such as classification and segmentation. While the vulnerability of CNNs to adversarial attacks is a well-known problem, recent works have shown that ViTs are also susceptible to such attacks and suffer significant performance degradation under attack. The vulnerability of ViTs to carefully engineered adversarial samples raises serious concerns about their safety in clinical settings. In this paper, we propose a novel self-ensembling method to enhance the robustness of ViT in the presence of adversarial attacks. The proposed Self-Ensembling Vision Transformer (SEViT) leverages the fact that feature representations learned by initial blocks of a ViT are relatively unaffected by adversarial perturbations. Learning multiple classifiers based on these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Byte Pair Encoding
