SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
Meng Lou, Yunxiang Fu, Yizhou Yu

TL;DR
SparX introduces a sparse cross-layer connection mechanism inspired by human visual system principles, enhancing feature interaction and efficiency in vision backbone networks, leading to improved accuracy with fewer parameters.
Contribution
The paper proposes a novel sparse cross-layer connection mechanism, SparX, for vision models, inspired by retinal ganglion cells, enabling efficient multi-layer feature aggregation with reduced computational costs.
Findings
SparX-Mamba-T improves top-1 accuracy from 82.5% to 83.5%.
SparX-Swin-T achieves a 1.3% increase in top-1 accuracy.
The mechanism offers a good balance of model size, computational cost, and accuracy.
Abstract
Due to the capability of dynamic state space models (SSMs) in capturing long-range dependencies with linear-time computational complexity, Mamba has shown notable performance in NLP tasks. This has inspired the rapid development of Mamba-based vision models, resulting in promising results in visual recognition tasks. However, such models are not capable of distilling features across layers through feature aggregation, interaction, and selection. Moreover, existing cross-layer feature aggregation methods designed for CNNs or ViTs are not practical in Mamba-based models due to high computational costs. Therefore, this paper aims to introduce an efficient cross-layer feature aggregation mechanism for vision backbone networks. Inspired by the Retinal Ganglion Cells (RGCs) in the human visual system, we propose a new sparse cross-layer connection mechanism termed SparX to effectively improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Vision and Imaging · Infrared Target Detection Methodologies
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
