M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth   Estimation

Yingshuang Zou; Yikang Ding; Xi Qiu; Haoqian Wang; Haotian Zhang

arXiv:2405.02004·cs.CV·May 6, 2024·1 cites

M${^2}$Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation

Yingshuang Zou, Yikang Ding, Xi Qiu, Haoqian Wang, Haotian Zhang

PDF

Open Access

TL;DR

M${^2}$Depth is a self-supervised multi-camera depth estimation network that effectively combines spatial and temporal information to produce accurate, scale-aware surrounding depth for autonomous driving, outperforming previous methods.

Contribution

The paper introduces a novel spatial-temporal fusion module and combines neural priors with internal features to improve multi-camera depth estimation accuracy.

Findings

01

Achieves state-of-the-art results on nuScenes and DDAD benchmarks.

02

Effectively integrates spatial and temporal information for depth estimation.

03

Reduces ambiguity between foreground and background using neural priors.

Abstract

This paper presents a novel self-supervised two-frame multi-camera metric depth estimation network, termed M $^{2}$ Depth, which is designed to predict reliable scale-aware surrounding depth in autonomous driving. Unlike the previous works that use multi-view images from a single time-step or multiple time-step images from a single camera, M $^{2}$ Depth takes temporally adjacent two-frame images from multiple cameras as inputs and produces high-quality surrounding depth. We first construct cost volumes in spatial and temporal domains individually and propose a spatial-temporal fusion module that integrates the spatial-temporal information to yield a strong volume presentation. We additionally combine the neural prior from SAM features with internal features to reduce the ambiguity between foreground and background and strengthen the depth edges. Extensive experimental results on nuScenes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Advanced Image and Video Retrieval Techniques

MethodsSegment Anything Model