Revisiting PatchMatch Multi-View Stereo for Urban 3D Reconstruction
Marco Orsingher, Paolo Zani, Paolo Medici, Massimo Bertozzi

TL;DR
This paper presents an enhanced PatchMatch Multi-View Stereo pipeline for urban 3D reconstruction, integrating SLAM, novel loss functions, and global refinement to achieve state-of-the-art results on the KITTI dataset.
Contribution
It introduces a comprehensive urban 3D reconstruction pipeline that combines SLAM initialization, a novel depth-normal consistency loss, and a global refinement step.
Findings
Achieves state-of-the-art performance on KITTI dataset
Effectively balances local PatchMatch optimization with global consistency
Outperforms classical MVS algorithms and monocular depth networks
Abstract
In this paper, a complete pipeline for image-based 3D reconstruction of urban scenarios is proposed, based on PatchMatch Multi-View Stereo (MVS). Input images are firstly fed into an off-the-shelf visual SLAM system to extract camera poses and sparse keypoints, which are used to initialize PatchMatch optimization. Then, pixelwise depths and normals are iteratively computed in a multi-scale framework with a novel depth-normal consistency loss term and a global refinement algorithm to balance the inherently local nature of PatchMatch. Finally, a large-scale point cloud is generated by back-projecting multi-view consistent estimates in 3D. The proposed approach is carefully evaluated against both classical MVS algorithms and monocular depth networks on the KITTI dataset, showing state of the art performances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Remote Sensing and LiDAR Applications · Robotics and Sensor-Based Localization
