Multiple Prior Representation Learning for Self-Supervised Monocular   Depth Estimation via Hybrid Transformer

Guodong Sun; Junjie Liu; Mingxuan Liu; Moyun Liu; Yang Zhang

arXiv:2406.08928·cs.CV·June 14, 2024

Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer

Guodong Sun, Junjie Liu, Mingxuan Liu, Moyun Liu, Yang Zhang

PDF

Open Access 1 Repo

TL;DR

This paper presents a self-supervised monocular depth estimation model that integrates multiple priors, including spatial, context, and semantic information, using a hybrid transformer to improve depth prediction accuracy in complex scenes.

Contribution

The paper introduces a novel multi-prior approach utilizing a hybrid transformer and semantic boundary loss to enhance scene understanding in depth estimation.

Findings

01

Improved depth estimation accuracy across three datasets.

02

Enhanced generalization in complex and untextured regions.

03

Effective integration of spatial, context, and semantic priors.

Abstract

Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data. However, the lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately. Prior information can potentially mitigate this issue, enhancing the model's understanding of scene structure and texture. Nevertheless, solely relying on a single type of prior information often falls short when dealing with complex scenes, necessitating improvements in generalization performance. To address these challenges, we introduce a novel self-supervised monocular depth estimation model that leverages multiple priors to bolster representation capabilities across spatial, context, and semantic dimensions. Specifically, we employ a hybrid transformer and a lightweight pose network to obtain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mvme-hbut/mprlnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical measurement and interference techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications