Towards Scale-Aware Full Surround Monodepth with Transformers

Yuchen Yang; Xinyi Wang; Dong Li; Lu Tian; Ashish Sirasao; Xun Yang

arXiv:2407.10406·cs.CV·July 16, 2024

Towards Scale-Aware Full Surround Monodepth with Transformers

Yuchen Yang, Xinyi Wang, Dong Li, Lu Tian, Ashish Sirasao, Xun Yang

PDF

Open Access

TL;DR

This paper introduces a scale-aware full surround monodepth method using transformers, enhancing depth estimation accuracy by optimizing network structure and training pipeline, and outperforming existing FSM methods without median-scaling.

Contribution

The paper proposes a transformer-based depth network with neighbor-enhanced cross-view attention and a progressive training scheme to improve scale-awareness in FSM methods.

Findings

01

Significant accuracy improvement over state-of-the-art FSM methods.

02

Effective scale-awareness without median-scaling at test time.

03

Better cross-view context aggregation via transformer-based modules.

Abstract

Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a standalone monocular camera. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation. To this end, we propose to improve FSM from two perspectives: depth network structure optimization and training pipeline optimization. First, we construct a transformer-based depth network with neighbor-enhanced cross-view attention (NCA). The cross-attention modules can better aggregate the cross-view context in both global and neighboring views. Second, we formulate a transformer-based feature matching scheme with progressive training to improve the structure-from-motion (SfM) pipeline. That allows us to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Advanced Materials and Mechanics · Interactive and Immersive Displays

MethodsSoftmax · Attention Is All You Need · Focus