Understanding Robustness of Visual State Space Models for Image   Classification

Chengbin Du; Yanxi Li; Chang Xu

arXiv:2403.10935·cs.CV·March 19, 2024·1 cites

Understanding Robustness of Visual State Space Models for Image Classification

Chengbin Du, Yanxi Li, Chang Xu

PDF

Open Access 1 Repo

TL;DR

This paper thoroughly investigates the robustness of the Visual State Space Model (VMamba) in image classification, revealing its strengths against certain adversarial attacks and weaknesses in scalability and sensitivity to image structure variations.

Contribution

It provides the first comprehensive analysis of VMamba's robustness, including adversarial resistance, generalizability, and vulnerabilities, offering insights for future improvements.

Findings

01

VMamba shows superior adversarial robustness compared to Transformers.

02

It exhibits strong generalizability to out-of-distribution data.

03

Scalability weaknesses and vulnerabilities to image structure variations are identified.

Abstract

Visual State Space Model (VMamba) has recently emerged as a promising architecture, exhibiting remarkable performance in various computer vision tasks. However, its robustness has not yet been thoroughly studied. In this paper, we delve into the robustness of this architecture through comprehensive investigations from multiple perspectives. Firstly, we investigate its robustness to adversarial attacks, employing both whole-image and patch-specific adversarial attacks. Results demonstrate superior adversarial robustness compared to Transformer architectures while revealing scalability weaknesses. Secondly, the general robustness of VMamba is assessed against diverse scenarios, including natural adversarial examples, out-of-distribution data, and common corruptions. VMamba exhibits exceptional generalizability with out-of-distribution data but shows scalability weaknesses against natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jcruan519/petrobustness
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications · Face and Expression Recognition

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Softmax · Layer Normalization · Multi-Head Attention · Dropout · Residual Connection · Position-Wise Feed-Forward Layer · Byte Pair Encoding