Attention meets Geometry: Geometry Guided Spatial-Temporal Attention for Consistent Self-Supervised Monocular Depth Estimation
Patrick Ruhkamp, Daoyi Gao, Hanzhi Chen, Nassir Navab, Benjamin Busam

TL;DR
This paper introduces a novel spatial-temporal attention framework with geometric regularization for self-supervised monocular depth estimation, significantly enhancing depth consistency and accuracy across consecutive frames.
Contribution
It proposes a new geometric-guided attention mechanism and regularization techniques to improve depth consistency and accuracy in self-supervised monocular depth estimation.
Findings
Improved temporal depth stability over previous methods
Enhanced geometric consistency across frames
Better depth accuracy with the proposed attention modules
Abstract
Inferring geometrically consistent dense 3D scenes across a tuple of temporally consecutive images remains challenging for self-supervised monocular depth prediction pipelines. This paper explores how the increasingly popular transformer architecture, together with novel regularized loss formulations, can improve depth consistency while preserving accuracy. We propose a spatial attention module that correlates coarse depth predictions to aggregate local geometric information. A novel temporal attention mechanism further processes the local geometric information in a global context across consecutive images. Additionally, we introduce geometric constraints between frames regularized by photometric cycle consistency. By combining our proposed regularization and the novel spatial-temporal-attention module we fully leverage both the geometric and appearance-based consistency across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Computer Graphics and Visualization Techniques
MethodsConvolution · Sigmoid Activation · Average Pooling · Max Pooling
