GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for   Indoor Scenes

Chaoqiang Zhao; Matteo Poggi; Fabio Tosi; Lei Zhou; Qiyu Sun; Yang; Tang; Stefano Mattoccia

arXiv:2309.16019·cs.CV·September 29, 2023·2 cites

GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for Indoor Scenes

Chaoqiang Zhao, Matteo Poggi, Fabio Tosi, Lei Zhou, Qiyu Sun, Yang, Tang, Stefano Mattoccia

PDF

Open Access 1 Repo

TL;DR

GasMono introduces a novel framework combining geometry-based pose refinement and vision transformer-based depth guidance to improve self-supervised monocular depth estimation in indoor scenes, achieving state-of-the-art results.

Contribution

The paper proposes a new method that refines coarse camera poses during training and integrates vision transformers with self-distillation to enhance depth estimation accuracy.

Findings

01

Outperforms previous methods on multiple indoor datasets.

02

Achieves state-of-the-art accuracy in monocular depth estimation.

03

Demonstrates strong generalization across diverse indoor scenes.

Abstract

This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture. We ease the learning process by obtaining coarse camera poses from monocular sequences through multi-view geometry to deal with the former. However, we found that limited by the scale ambiguity across different scenes in the training dataset, a na\"ive introduction of geometric coarse poses cannot play a positive role in performance improvement, which is counter-intuitive. To address this problem, we propose to refine those poses during training through rotation and translation/scale optimization. To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism, providing more accurate depth guidance coming from the network itself.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zxcqlf/gasmono
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Robotics and Sensor-Based Localization