Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Montasir Shams; Chashi Mahiul Islam; Shaeke Salman; Phat Tran; Xiuwen Liu

arXiv:2507.01788·cs.CV·July 11, 2025

Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging

Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu

PDF

Open Access

TL;DR

This paper investigates whether Vision Transformer representations in medical imaging are semantically meaningful, revealing they are not and are vulnerable to small changes, which impacts their reliability in critical applications.

Contribution

First systematic demonstration that ViT representations in medical imaging lack semantic meaningfulness and are susceptible to imperceptible changes.

Findings

01

Representations are not semantically meaningful.

02

Small changes can drastically alter representations.

03

Classification accuracy drops by over 60% due to minor perturbations.

Abstract

Vision transformers (ViTs) have rapidly gained prominence in medical imaging tasks such as disease classification, segmentation, and detection due to their superior accuracy compared to conventional deep learning models. However, due to their size and complex interactions via the self-attention mechanism, they are not well understood. In particular, it is unclear whether the representations produced by such models are semantically meaningful. In this paper, using a projected gradient-based algorithm, we show that their representations are not semantically meaningful and they are inherently vulnerable to small changes. Images with imperceptible differences can have very different representations; on the other hand, images that should belong to different semantic classes can have nearly identical representations. Such vulnerability can lead to unreliable classification results; for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis