Exploring Adversarial Robustness of Deep State Space Models
Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma,, Bowen Zhou

TL;DR
This paper investigates the adversarial robustness of Deep State Space Models (SSMs), analyzing how different components like Attention affect robustness and proposing an adaptive scaling method to improve defense against adversarial attacks.
Contribution
It provides a comprehensive empirical and theoretical analysis of SSMs under adversarial training, revealing the role of Attention and proposing a new adaptive scaling mechanism to enhance robustness.
Findings
Attention improves robustness-generalization trade-off in SSMs.
Fixed-parameter SSMs have output error bounds related to their parameters.
Input-dependent SSMs may experience error explosion under adversarial perturbations.
Abstract
Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs remains unclear. While many enhancements in SSM components, such as integrating Attention mechanisms and expanding to data-dependent SSM parameterizations, have brought significant gains in Standard Training (ST) settings, their potential benefits in AT remain unexplored. To investigate this, we evaluate existing structural variants of SSMs with AT to assess their AR performance. We observe that pure SSM structures struggle to benefit from AT, whereas incorporating Attention yields a markedly…
Peer Reviews
Decision·NeurIPS 2024 poster
- Originality: The paper addresses an important and under-explored area in the intersection of SSMs (that have become very prominent recently, both for text and vision) and adversarial robustness. It provides novel insights into how different SSM components behave under adversarial attacks and training, including important observations about robust overfitting. - Quality: The work demonstrates high-quality research through its comprehensive empirical evaluations and some theoretical analysis. Th
- I don’t quite understand the claim about robust overfitting: based on Table 1, using AutoAttack, the “Diff” is always very small, almost always <1%. Why is there robust overfitting then? - Limited scope of experiments: While the paper provides comprehensive evaluations on MNIST and CIFAR-10 datasets, it lacks experiments on more complex datasets (e.g., at least Tiny ImageNet or some dataset with a higher image resolution) or real-world scenarios. This limitation somewhat restricts the generali
1. The paper is well organized and easy to follow. 2. The experiments on serval image classification benchmarks are solid and comprehensive. 3. The theory analysis and visualization is helpful.
1. Tested dataset size is small. 2. Better to add a result of without AdS in Table 2 so we can better observe the improvement. 3. Lack of analysis why different activation function in AdS influence performance.
1. The paper is well-written with a clear storyline and a suitable motivation. 2. The trustworthiness (e.g., robustness, explainability) research for SSM is an important topic and is yet to be fully explored. 3. The theoretical proof of the generalization bound is clear.
1. The evaluation dataset is not sufficiently scaled up to be representative. It would be more convincing if the paper developed similar findings on larger datasets such as CIFAR-100 and Tiny-ImageNet. 2. The involved adversarial training methods lack novelty. While PGD-AT and TRADES are classic AT methods, other representative variants can significantly improve AT efficiency and its robust performance. To name a few, Free-AT [1] substantially improves the training efficiency, and YOPO [2] boo
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
