Towards Scale-Aware Full Surround Monodepth with Transformers
Yuchen Yang, Xinyi Wang, Dong Li, Lu Tian, Ashish Sirasao, Xun Yang

TL;DR
This paper introduces a scale-aware full surround monodepth method using transformers, enhancing depth estimation accuracy by optimizing network structure and training pipeline, and outperforming existing FSM methods without median-scaling.
Contribution
The paper proposes a transformer-based depth network with neighbor-enhanced cross-view attention and a progressive training scheme to improve scale-awareness in FSM methods.
Findings
Significant accuracy improvement over state-of-the-art FSM methods.
Effective scale-awareness without median-scaling at test time.
Better cross-view context aggregation via transformer-based modules.
Abstract
Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a standalone monocular camera. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation. To this end, we propose to improve FSM from two perspectives: depth network structure optimization and training pipeline optimization. First, we construct a transformer-based depth network with neighbor-enhanced cross-view attention (NCA). The cross-attention modules can better aggregate the cross-view context in both global and neighboring views. Second, we formulate a transformer-based feature matching scheme with progressive training to improve the structure-from-motion (SfM) pipeline. That allows us to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Advanced Materials and Mechanics · Interactive and Immersive Displays
MethodsSoftmax · Attention Is All You Need · Focus
