A-TVSNet: Aggregated Two-View Stereo Network for Multi-View Stereo Depth Estimation
Sizhang Dai, Weibing Huang

TL;DR
A-TVSNet is a novel multi-view stereo depth estimation network that combines initial depth prediction, a refinement process leveraging photometric and geometric cues, and an attention-based multi-view aggregation for improved accuracy.
Contribution
It introduces an innovative aggregation framework and a refinement network that together enhance multi-view stereo depth estimation performance.
Findings
Outperforms existing methods on various MVS datasets.
Produces high-quality, accurate depth maps.
Efficient information exchange among stereo pairs.
Abstract
We propose a learning-based network for depth map estimation from multi-view stereo (MVS) images. Our proposed network consists of three sub-networks: 1) a base network for initial depth map estimation from an unstructured stereo image pair, 2) a novel refinement network that leverages both photometric and geometric information, and 3) an attentional multi-view aggregation framework that enables efficient information exchange and integration among different stereo image pairs. The proposed network, called A-TVSNet, is evaluated on various MVS datasets and shows the ability to produce high quality depth map that outperforms competing approaches. Our code is available at https://github.com/daiszh/A-TVSNet.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Advanced Image Processing Techniques
