Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

Chen Wu; Ling Wang; Zhuoran Zheng; Xiangyu Chen; Jingyuan Xia; Weidong Jiang; Jiantao Zhou

arXiv:2605.11934·cs.CV·May 13, 2026

Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution

Chen Wu, Ling Wang, Zhuoran Zheng, Xiangyu Chen, Jingyuan Xia, Weidong Jiang, Jiantao Zhou

PDF

TL;DR

This paper introduces an efficient depth super-resolution framework that enables dense, semantic-level interactions between RGB and depth features using a novel state space model with linear complexity.

Contribution

It proposes a new interactive state space model with cross-modal local scanning for improved depth super-resolution, combining semantic interactions and efficiency.

Findings

01

Achieves competitive performance against state-of-the-art methods.

02

Enables dense semantic interactions with linear complexity.

03

Introduces a cross-modal matching transform module.

Abstract

Guided depth super-resolution (GDSR) reconstructs HR depth maps from LR inputs with HR RGB guidance. Existing methods either model each modality independently or rely on computationally expensive attention mechanisms with quadratic complexity, hindering the establishment of efficient and semantically interactive joint representations. In this paper, we observe that feature maps from different modalities exhibit semantic-level correlations during feature extraction. This motivates us to develop a more flexible approach enabling dense, semantically-aware deep interactions between modalities. To this end, we propose a novel GDSR framework centered around the Interactive State Space Model. Specifically, we design a cross-modal local scanning mechanism that enables fine-grained semantic interactions between RGB and depth features. Leveraging the Mamba architecture, our framework achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.