Multi-View Stereo Network with attention thin volume
Zihang Wan

TL;DR
This paper introduces an efficient multi-view stereo network that uses self-attention and group-wise correlation to construct a lightweight cost volume, improving depth inference accuracy and computational efficiency.
Contribution
It presents a novel cost volume construction method combining self-attention and group-wise correlation, enhancing efficiency and accuracy in multi-view stereo depth estimation.
Findings
Achieves better depth accuracy with reduced memory usage.
Demonstrates improved performance on benchmark datasets.
Validates the effectiveness of attention thin volume approach.
Abstract
We propose an efficient multi-view stereo (MVS) network for infering depth value from multiple RGB images. Recent studies have shown that mapping the geometric relationship in real space to neural network is an essential topic of the MVS problem. Specifically, these methods focus on how to express the correspondence between different views by constructing a nice cost volume. In this paper, we propose a more complete cost volume construction approach based on absorbing previous experience. First of all, we introduce the self-attention mechanism to fully aggregate the dominant information from input images and accurately model the long-range dependency, so as to selectively aggregate reference features. Secondly, we introduce the group-wise correlation to feature aggregation, which greatly reduces the memory and calculation burden. Meanwhile, this method enhances the information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
