MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth
Chenjie Cao, Xinlin Ren, Yanwei Fu

TL;DR
MVSFormer introduces a ViT-enhanced Multi-View Stereo network that improves feature representation and generalization, achieving state-of-the-art results on standard benchmarks by leveraging pre-trained transformers and a unified classification-regression approach.
Contribution
The paper proposes MVSFormer, a novel MVS network utilizing pre-trained Vision Transformers to enhance feature learning and unifies classification and regression methods with a temperature-based strategy.
Findings
Achieves state-of-the-art performance on DTU dataset.
Ranks Top-1 on Tanks-and-Temples leaderboard.
Effectively generalizes to various input resolutions.
Abstract
Feature representation learning is the key recipe for learning-based Multi-View Stereo (MVS). As the common feature extractor of learning-based MVS, vanilla Feature Pyramid Networks (FPNs) suffer from discouraged feature representations for reflection and texture-less areas, which limits the generalization of MVS. Even FPNs worked with pre-trained Convolutional Neural Networks (CNNs) fail to tackle these issues. On the other hand, Vision Transformers (ViTs) have achieved prominent success in many 2D vision tasks. Thus we ask whether ViTs can facilitate feature learning in MVS? In this paper, we propose a pre-trained ViT enhanced MVS network called MVSFormer, which can learn more reliable feature representations benefited by informative priors from ViT. The finetuned MVSFormer with hierarchical ViTs of efficient attention mechanisms can achieve prominent improvement based on FPNs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Image Processing Techniques and Applications
