Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu

TL;DR
This paper investigates whether Vision Transformer representations in medical imaging are semantically meaningful, revealing they are not and are vulnerable to small changes, which impacts their reliability in critical applications.
Contribution
First systematic demonstration that ViT representations in medical imaging lack semantic meaningfulness and are susceptible to imperceptible changes.
Findings
Representations are not semantically meaningful.
Small changes can drastically alter representations.
Classification accuracy drops by over 60% due to minor perturbations.
Abstract
Vision transformers (ViTs) have rapidly gained prominence in medical imaging tasks such as disease classification, segmentation, and detection due to their superior accuracy compared to conventional deep learning models. However, due to their size and complex interactions via the self-attention mechanism, they are not well understood. In particular, it is unclear whether the representations produced by such models are semantically meaningful. In this paper, using a projected gradient-based algorithm, we show that their representations are not semantically meaningful and they are inherently vulnerable to small changes. Images with imperceptible differences can have very different representations; on the other hand, images that should belong to different semantic classes can have nearly identical representations. Such vulnerability can lead to unreliable classification results; for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
