Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation

Zhangcheng Hou; Tomoaki Ohtsuki

arXiv:2605.11840·cs.CV·May 13, 2026

Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation

Zhangcheng Hou, Tomoaki Ohtsuki

PDF

TL;DR

This paper introduces Radar-Modulated Selection (RMS), a novel method for radar-camera depth estimation that integrates radar within the model's core, leading to state-of-the-art accuracy and efficiency.

Contribution

The paper proposes RMS, a new approach that modulates the model internally with radar data, enabling linear-cost cross-modal coupling and better fallback to image-only processing.

Findings

01

Achieves state-of-the-art depth estimation on nuScenes with 34% MAE reduction.

02

RMS provides lowest single-frame latency at 26.8ms.

03

In-scan selection replaces out-of-scan fusion without loss of accuracy.

Abstract

Radar-camera depth estimation must turn an ultra-sparse, all-weather, metric radar signal into a dense per-pixel depth map. Existing methods -- concatenation, confidence-aware gating, sparse supervision, graph-based extraction -- combine radar and image features outside the backbone's sequence operator, and even cross-modal Mamba variants leave the selection mechanism itself unimodal. We argue that the selection mechanism is the right place for radar to enter. We introduce Radar-Modulated Selection (RMS), a minimal and principled way to inject radar into Mamba's selective scan: radar modulates the scan from within, adding zero-initialised perturbations to the step size $Δ$ and readout $C$ while leaving the input projection $B$ and state dynamics $A$ image-only. The construction is exactly equivalent to a pretrained image-only Mamba at initialisation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.