Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers
Chashi Mahiul Islam, Samuel Jacob Chacko, Mao Nishino, and Xiuwen Liu

TL;DR
This paper investigates the vulnerabilities of vision transformers to adversarial attacks, revealing how perturbations propagate through layers, and introduces NeuroShield-ViT, a defense mechanism that improves robustness without fine-tuning.
Contribution
The study provides a detailed analysis of representation vulnerabilities in ViT and proposes NeuroShield-ViT, a novel layer-neutralization defense method that enhances robustness against adversarial attacks.
Findings
NeuroShield-ViT achieves 77.8% accuracy on adversarial examples without fine-tuning.
Adversarial effects amplify from early to late layers in ViT.
NeuroShield-ViT outperforms traditional robustness methods against strong iterative attacks.
Abstract
While transformer-based models dominate NLP and vision applications, their underlying mechanisms to map the input space to the label space semantically are not well understood. In this paper, we study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations and semantically unrelated images can have the same representation. Our analysis indicates that imperceptible changes to the input can result in significant representation changes, particularly in later layers, suggesting potential instabilities in the performance of ViTs. Our comprehensive study reveals that adversarial effects, while subtle in early layers, propagate and amplify through the network, becoming most pronounced in middle to late layers. This insight motivates the development of NeuroShield-ViT, a novel defense mechanism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Memory and Neural Computing
