A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding
Yitong Dong, Yijin Li, Zhaoyang Huang, Weikang Bian, Jingbo Liu, Hujun, Bao, Zhaopeng Cui, Hongsheng Li, Guofeng Zhang

TL;DR
This paper introduces a depth-range-free multi-view stereo transformer network that leverages pose embedding and multi-view disparity attention to improve 3D reconstruction accuracy without prior depth range assumptions.
Contribution
It proposes a novel multi-view stereo framework that models geometric constraints with pose embedding and long-range attention, eliminating the need for depth range priors.
Findings
Achieves state-of-the-art results on DTU and Tanks&Temple datasets.
Effectively models multi-view geometric constraints with pose embedding.
Improves 3D reconstruction accuracy without depth range prior.
Abstract
In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner, our method simultaneously considers all the source images. Specifically, we introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information within and across multi-view images. Considering the asymmetry of the epipolar disparity flow, the key to our method lies in accurately modeling multi-view geometric constraints. We integrate pose embedding to encapsulate information such as multi-view camera poses, providing implicit geometric constraints for multi-view disparity feature fusion dominated by attention. Additionally, we construct corresponding hidden states for each source image due to significant differences in the observation quality of the same pixel in the reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsOptical Coherence Tomography Applications · Image Processing Techniques and Applications · Image and Signal Denoising Methods
MethodsSoftmax · Attention Is All You Need
