Mechanistic Understandings of Representation Vulnerabilities and   Engineering Robust Vision Transformers

Chashi Mahiul Islam; Samuel Jacob Chacko; Mao Nishino; and Xiuwen Liu

arXiv:2502.04679·cs.CV·February 10, 2025

Mechanistic Understandings of Representation Vulnerabilities and Engineering Robust Vision Transformers

Chashi Mahiul Islam, Samuel Jacob Chacko, Mao Nishino, and Xiuwen Liu

PDF

Open Access

TL;DR

This paper investigates the vulnerabilities of vision transformers to adversarial attacks, revealing how perturbations propagate through layers, and introduces NeuroShield-ViT, a defense mechanism that improves robustness without fine-tuning.

Contribution

The study provides a detailed analysis of representation vulnerabilities in ViT and proposes NeuroShield-ViT, a novel layer-neutralization defense method that enhances robustness against adversarial attacks.

Findings

01

NeuroShield-ViT achieves 77.8% accuracy on adversarial examples without fine-tuning.

02

Adversarial effects amplify from early to late layers in ViT.

03

NeuroShield-ViT outperforms traditional robustness methods against strong iterative attacks.

Abstract

While transformer-based models dominate NLP and vision applications, their underlying mechanisms to map the input space to the label space semantically are not well understood. In this paper, we study the sources of known representation vulnerabilities of vision transformers (ViT), where perceptually identical images can have very different representations and semantically unrelated images can have the same representation. Our analysis indicates that imperceptible changes to the input can result in significant representation changes, particularly in later layers, suggesting potential instabilities in the performance of ViTs. Our comprehensive study reveals that adversarial effects, while subtle in early layers, propagate and amplify through the network, becoming most pronounced in middle to late layers. This insight motivates the development of NeuroShield-ViT, a novel defense mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Memory and Neural Computing