ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

Wenjin Hou; Dingjie Fu; Kun Li; Shiming Chen; Hehe Fan; Yi Yang

arXiv:2408.14868·cs.CV·December 12, 2024

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

Wenjin Hou, Dingjie Fu, Kun Li, Shiming Chen, Hehe Fan, Yi Yang

PDF

Open Access 1 Video

TL;DR

ZeroMamba introduces a visual state space model-based framework for zero-shot learning that effectively captures long-range dependencies and complex visual dynamics, significantly improving performance over existing CNN and ViT methods.

Contribution

The paper proposes ZeroMamba, a novel zero-shot learning framework utilizing a visual state space model with semantic-aware local projection, global representation learning, and semantic fusion.

Findings

01

Outperforms state-of-the-art methods on four ZSL benchmarks.

02

Effective in both conventional and generalized ZSL settings.

03

Demonstrates the benefits of visual state space modeling for ZSL.

Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global visual features from Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for visual-semantic interactions. Due to the limited receptive fields of CNNs and the quadratic complexity of ViTs, however, these visual backbones achieve suboptimal visual-semantic interactions. In this paper, motivated by the visual state space model (i.e., Vision Mamba), which is capable of capturing long-range dependencies and modeling complex visual dynamics, we propose a parameter-efficient ZSL framework called ZeroMamba to advance ZSL. Our ZeroMamba comprises three key components: Semantic-aware Local Projection (SLP), Global Representation Learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning· underline

Taxonomy

TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces