START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation
Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao

TL;DR
START introduces a saliency-driven token-aware transformation within state space models, enabling better domain generalization by suppressing domain-specific features, achieving state-of-the-art results efficiently.
Contribution
The paper proposes a novel SSM-based architecture with saliency-based token-aware transformation (START) that improves domain generalization and outperforms existing methods.
Findings
START achieves SOTA performance on five benchmarks.
START reduces domain discrepancy by suppressing domain-specific features.
START maintains linear complexity, ensuring computational efficiency.
Abstract
Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. Existing DG methods primarily rely on convolutional neural networks (CNNs), which inherently learn texture biases due to their limited receptive fields, making them prone to overfitting source domains. While some works have introduced transformer-based methods (ViTs) for DG to leverage the global receptive field, these methods incur high computational costs due to the quadratic complexity of self-attention. Recently, advanced state space models (SSMs), represented by Mamba, have shown promising results in supervised learning tasks by achieving linear complexity in sequence length during training and fast RNN-like computation during inference. Inspired by this, we investigate the generalization ability of the Mamba model under domain shifts and find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDistributed systems and fault tolerance · Business Process Modeling and Analysis · Service-Oriented Architecture and Web Services
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
