Towards Evaluating the Robustness of Visual State Space Models

Hashmat Shadab Malik; Fahad Shamshad; Muzammal Naseer; Karthik; Nandakumar; Fahad Shahbaz Khan; Salman Khan

arXiv:2406.09407·cs.CV·September 17, 2024

Towards Evaluating the Robustness of Visual State Space Models

Hashmat Shadab Malik, Fahad Shamshad, Muzammal Naseer, Karthik, Nandakumar, Fahad Shahbaz Khan, Salman Khan

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the robustness of Vision State Space Models (VSSMs) against various natural and adversarial perturbations, comparing their performance with other architectures and analyzing their resilience in complex visual scenarios.

Contribution

It provides a comprehensive robustness evaluation of VSSMs across multiple perturbation types and benchmarks, highlighting their strengths and limitations in complex visual tasks.

Findings

01

VSSMs show robustness to certain corruptions but are vulnerable to others.

02

Frequency analysis reveals differential performance against low and high-frequency attacks.

03

VSSMs outperform some architectures in specific robustness scenarios.

Abstract

Vision State Space Models (VSSMs), a novel architecture that combines the strengths of recurrent neural networks and latent variable models, have demonstrated remarkable performance in visual perception tasks by efficiently capturing long-range dependencies and modeling complex visual dynamics. However, their robustness under natural and adversarial perturbations remains a critical concern. In this work, we present a comprehensive evaluation of VSSMs' robustness under various perturbation scenarios, including occlusions, image structure, common corruptions, and adversarial attacks, and compare their performance to well-established architectures such as transformers and Convolutional Neural Networks. Furthermore, we investigate the resilience of VSSMs to object-background compositional changes on sophisticated benchmarks designed to test model performance in complex visual scenes. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hashmatshadab/mambarobustness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection