V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints
Nathaniel Burgdorfer, Philippos Mordohai

TL;DR
V-FUSE is a learning-based framework that enhances multi-view stereo depth maps by integrating volumetric visibility constraints and a depth search window estimation sub-network, leading to more accurate 3D reconstructions.
Contribution
It introduces a novel end-to-end trainable architecture that incorporates long-range visibility constraints and a depth search window estimation sub-network for improved depth map fusion.
Findings
Significant accuracy improvements on MVS datasets.
Effective modeling of depth consensus and visibility violations.
Reduces need for fine-tuning fusion parameters.
Abstract
We introduce a learning-based depth map fusion framework that accepts a set of depth and confidence maps generated by a Multi-View Stereo (MVS) algorithm as input and improves them. This is accomplished by integrating volumetric visibility constraints that encode long-range surface relationships across different views into an end-to-end trainable architecture. We also introduce a depth search window estimation sub-network trained jointly with the larger fusion sub-network to reduce the depth hypothesis search space along each ray. Our method learns to model depth consensus and violations of visibility constraints directly from the data; effectively removing the necessity of fine-tuning fusion parameters. Extensive experiments on MVS datasets show substantial improvements in the accuracy of the output fused depth and confidence maps.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Image Processing Techniques and Applications
