M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
Yingshuang Zou, Yikang Ding, Xi Qiu, Haoqian Wang, Haotian Zhang

TL;DR
M${^2}$Depth is a self-supervised multi-camera depth estimation network that effectively combines spatial and temporal information to produce accurate, scale-aware surrounding depth for autonomous driving, outperforming previous methods.
Contribution
The paper introduces a novel spatial-temporal fusion module and combines neural priors with internal features to improve multi-camera depth estimation accuracy.
Findings
Achieves state-of-the-art results on nuScenes and DDAD benchmarks.
Effectively integrates spatial and temporal information for depth estimation.
Reduces ambiguity between foreground and background using neural priors.
Abstract
This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed MDepth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, MDepth takes temporally adjacent two-frame images from multiple cameras as inputs and produces high-quality surrounding depth. We first construct cost volumes in spatial and temporal domains individually and propose a spatial-temporal fusion module that integrates the spatial-temporal information to yield a strong volume presentation. We additionally combine the neural prior from SAM features with internal features to reduce the ambiguity between foreground and background and strengthen the depth edges. Extensive experimental results on nuScenes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Advanced Image and Video Retrieval Techniques
MethodsSegment Anything Model
