Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for   Consistent Self-Supervised Monocular Depth Estimation

Patrick Ruhkamp; Daoyi Gao; Hanzhi Chen; Nassir Navab; Benjamin Busam

arXiv:2110.08192·cs.CV·October 18, 2021

Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation

Patrick Ruhkamp, Daoyi Gao, Hanzhi Chen, Nassir Navab, Benjamin Busam

PDF

Open Access

TL;DR

This paper introduces a novel spatial-temporal attention framework with geometric regularization for self-supervised monocular depth estimation, significantly enhancing depth consistency and accuracy across consecutive frames.

Contribution

It proposes a new geometric-guided attention mechanism and regularization techniques to improve depth consistency and accuracy in self-supervised monocular depth estimation.

Findings

01

Improved temporal depth stability over previous methods

02

Enhanced geometric consistency across frames

03

Better depth accuracy with the proposed attention modules

Abstract

Inferring geometrically consistent dense 3D scenes across a tuple of temporally consecutive images remains challenging for self-supervised monocular depth prediction pipelines. This paper explores how the increasingly popular transformer architecture, together with novel regularized loss formulations, can improve depth consistency while preserving accuracy. We propose a spatial attention module that correlates coarse depth predictions to aggregate local geometric information. A novel temporal attention mechanism further processes the local geometric information in a global context across consecutive images. Additionally, we introduce geometric constraints between frames regularized by photometric cycle consistency. By combining our proposed regularization and the novel spatial-temporal-attention module we fully leverage both the geometric and appearance-based consistency across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Computer Graphics and Visualization Techniques

MethodsConvolution · Sigmoid Activation · Average Pooling · Max Pooling