ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning
Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang

TL;DR
ZeroMamba introduces a visual state space model-based framework for zero-shot learning that effectively captures long-range dependencies and complex visual dynamics, significantly improving performance over existing CNN and ViT methods.
Contribution
The paper proposes ZeroMamba, a novel zero-shot learning framework utilizing a visual state space model with semantic-aware local projection, global representation learning, and semantic fusion.
Findings
Outperforms state-of-the-art methods on four ZSL benchmarks.
Effective in both conventional and generalized ZSL settings.
Demonstrates the benefits of visual state space modeling for ZSL.
Abstract
Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global visual features from Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for visual-semantic interactions. Due to the limited receptive fields of CNNs and the quadratic complexity of ViTs, however, these visual backbones achieve suboptimal visual-semantic interactions. In this paper, motivated by the visual state space model (i.e., Vision Mamba), which is capable of capturing long-range dependencies and modeling complex visual dynamics, we propose a parameter-efficient ZSL framework called ZeroMamba to advance ZSL. Our ZeroMamba comprises three key components: Semantic-aware Local Projection (SLP), Global Representation Learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
