A Global Depth-Range-Free Multi-View Stereo Transformer Network with   Pose Embedding

Yitong Dong; Yijin Li; Zhaoyang Huang; Weikang Bian; Jingbo Liu; Hujun; Bao; Zhaopeng Cui; Hongsheng Li; Guofeng Zhang

arXiv:2411.01893·cs.CV·December 10, 2024

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

Yitong Dong, Yijin Li, Zhaoyang Huang, Weikang Bian, Jingbo Liu, Hujun, Bao, Zhaopeng Cui, Hongsheng Li, Guofeng Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a depth-range-free multi-view stereo transformer network that leverages pose embedding and multi-view disparity attention to improve 3D reconstruction accuracy without prior depth range assumptions.

Contribution

It proposes a novel multi-view stereo framework that models geometric constraints with pose embedding and long-range attention, eliminating the need for depth range priors.

Findings

01

Achieves state-of-the-art results on DTU and Tanks&Temple datasets.

02

Effectively models multi-view geometric constraints with pose embedding.

03

Improves 3D reconstruction accuracy without depth range prior.

Abstract

In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner, our method simultaneously considers all the source images. Specifically, we introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information within and across multi-view images. Considering the asymmetry of the epipolar disparity flow, the key to our method lies in accurately modeling multi-view geometric constraints. We integrate pose embedding to encapsulate information such as multi-view camera poses, providing implicit geometric constraints for multi-view disparity feature fusion dominated by attention. Additionally, we construct corresponding hidden states for each source image due to significant differences in the observation quality of the same pixel in the reference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding· slideslive

Taxonomy

TopicsOptical Coherence Tomography Applications · Image Processing Techniques and Applications · Image and Signal Denoising Methods

MethodsSoftmax · Attention Is All You Need