TL;DR
This paper demonstrates that using CNN-inferred depth maps from single images can significantly improve local feature matching by pre-warping images to rectify perspective distortions, enhancing robustness in 3D reconstruction tasks.
Contribution
It introduces a novel method leveraging single-image depth predictions to enhance local feature extraction and matching, improving robustness without requiring additional data or complex pre-requisites.
Findings
Enhanced feature matching with CNN-based depth pre-warping
Improved robustness in multi-view reconstruction
Effective even with opposite viewing directions
Abstract
Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
