Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer
Guodong Sun, Junjie Liu, Mingxuan Liu, Moyun Liu, Yang Zhang

TL;DR
This paper presents a self-supervised monocular depth estimation model that integrates multiple priors, including spatial, context, and semantic information, using a hybrid transformer to improve depth prediction accuracy in complex scenes.
Contribution
The paper introduces a novel multi-prior approach utilizing a hybrid transformer and semantic boundary loss to enhance scene understanding in depth estimation.
Findings
Improved depth estimation accuracy across three datasets.
Enhanced generalization in complex and untextured regions.
Effective integration of spatial, context, and semantic priors.
Abstract
Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data. However, the lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately. Prior information can potentially mitigate this issue, enhancing the model's understanding of scene structure and texture. Nevertheless, solely relying on a single type of prior information often falls short when dealing with complex scenes, necessitating improvements in generalization performance. To address these challenges, we introduce a novel self-supervised monocular depth estimation model that leverages multiple priors to bolster representation capabilities across spatial, context, and semantic dimensions. Specifically, we employ a hybrid transformer and a lightweight pose network to obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical measurement and interference techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications
