Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution
Chen Wu, Ling Wang, Zhuoran Zheng, Xiangyu Chen, Jingyuan Xia, Weidong Jiang, Jiantao Zhou

TL;DR
This paper introduces an efficient depth super-resolution framework that enables dense, semantic-level interactions between RGB and depth features using a novel state space model with linear complexity.
Contribution
It proposes a new interactive state space model with cross-modal local scanning for improved depth super-resolution, combining semantic interactions and efficiency.
Findings
Achieves competitive performance against state-of-the-art methods.
Enables dense semantic interactions with linear complexity.
Introduces a cross-modal matching transform module.
Abstract
Guided depth super-resolution (GDSR) reconstructs HR depth maps from LR inputs with HR RGB guidance. Existing methods either model each modality independently or rely on computationally expensive attention mechanisms with quadratic complexity, hindering the establishment of efficient and semantically interactive joint representations. In this paper, we observe that feature maps from different modalities exhibit semantic-level correlations during feature extraction. This motivates us to develop a more flexible approach enabling dense, semantically-aware deep interactions between modalities. To this end, we propose a novel GDSR framework centered around the Interactive State Space Model. Specifically, we design a cross-modal local scanning mechanism that enables fine-grained semantic interactions between RGB and depth features. Leveraging the Mamba architecture, our framework achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
