Understanding Robustness of Visual State Space Models for Image Classification
Chengbin Du, Yanxi Li, Chang Xu

TL;DR
This paper thoroughly investigates the robustness of the Visual State Space Model (VMamba) in image classification, revealing its strengths against certain adversarial attacks and weaknesses in scalability and sensitivity to image structure variations.
Contribution
It provides the first comprehensive analysis of VMamba's robustness, including adversarial resistance, generalizability, and vulnerabilities, offering insights for future improvements.
Findings
VMamba shows superior adversarial robustness compared to Transformers.
It exhibits strong generalizability to out-of-distribution data.
Scalability weaknesses and vulnerabilities to image structure variations are identified.
Abstract
Visual State Space Model (VMamba) has recently emerged as a promising architecture, exhibiting remarkable performance in various computer vision tasks. However, its robustness has not yet been thoroughly studied. In this paper, we delve into the robustness of this architecture through comprehensive investigations from multiple perspectives. Firstly, we investigate its robustness to adversarial attacks, employing both whole-image and patch-specific adversarial attacks. Results demonstrate superior adversarial robustness compared to Transformer architectures while revealing scalability weaknesses. Secondly, the general robustness of VMamba is assessed against diverse scenarios, including natural adversarial examples, out-of-distribution data, and common corruptions. VMamba exhibits exceptional generalizability with out-of-distribution data but shows scalability weaknesses against natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications · Face and Expression Recognition
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Softmax · Layer Normalization · Multi-Head Attention · Dropout · Residual Connection · Position-Wise Feed-Forward Layer · Byte Pair Encoding
