OCTOPUS: Enhancing the Spatial-Awareness of Vision SSMs with Multi-Dimensional Scans and Traversal Selection
Kunal Mahatha, Ali Bahri, Pierre Marza, Sahar Dastani, Maria Vakalopoulou, Stergios Christodoulidis, Jose Dolz, Christian Desrosiers

TL;DR
OCTOPUS introduces a novel spatially-aware vision model that combines multi-directional recurrence with state space models to improve local and global spatial understanding while maintaining linear complexity.
Contribution
It presents OCTOPUS, a new architecture that enhances spatial awareness in vision SSMs through multi-dimensional scans and traversal selection, addressing limitations of causality in traditional SSMs.
Findings
Improves boundary preservation and region consistency in segmentation.
Maintains better classification accuracy than existing V-SSM models.
Demonstrates scalability and effectiveness in vision tasks.
Abstract
State space models (SSMs) have recently emerged as an alternative to transformers due to their unique ability of modeling global relationships in text with linear complexity. However, their success in vision tasks has been limited due to their causal formulation, which is suitable for sequential text but detrimental in the spatial domain where causality breaks the inherent spatial relationships among pixels or patches. As a result, standard SSMs fail to capture local spatial coherence, often linking non-adjacent patches while ignoring neighboring ones that are visually correlated. To address these limitations, we introduce OCTOPUS , a novel architecture that preserves both global context and local spatial structure within images, while maintaining the linear complexity of SSMs. OCTOPUS performs discrete reoccurrence along eight principal orientations, going forward or backward in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Neural Networks and Reservoir Computing
