DepthMamba with Adaptive Fusion
Zelin Meng, Zhichen Wang

TL;DR
This paper introduces DepthMamba, a robust multi-view depth estimation method that adaptively fuses single-view and multi-view results using an attention mechanism, performing well under noisy pose conditions.
Contribution
It proposes a novel two-branch network with adaptive fusion and a new benchmark for evaluating depth estimation under pose noise.
Findings
Performs well on challenging scenes with dynamic objects and texture-less regions.
Achieves competitive results on KITTI and DDAD benchmarks.
Demonstrates robustness to noisy camera pose inputs.
Abstract
Multi-view depth estimation has achieved impressive performance over various benchmarks. However, almost all current multi-view systems rely on given ideal camera poses, which are unavailable in many real-world scenarios, such as autonomous driving. In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings. Surprisingly, we find current multi-view depth estimation methods or single-view and multi-view fusion methods will fail when given noisy pose settings. To tackle this challenge, we propose a two-branch network architecture which fuses the depth estimation results of single-view and multi-view branch. In specific, we introduced mamba to serve as feature extraction backbone and propose an attention-based fusion methods which adaptively select the most robust estimation results between the two branches. Thus, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Modular Robots and Swarm Intelligence
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
