TL;DR
This paper compares classical and learned feature matching methods for camera-to-3D building model localization, showing learned methods significantly outperform traditional ones in accuracy and robustness on challenging datasets.
Contribution
It provides a comprehensive evaluation of classical versus learned feature matching techniques specifically for semantic 3D building localization tasks.
Findings
Learnable methods outperform classical approaches in accuracy and robustness.
Learnable methods achieve higher inliers and better pose estimation.
Results demonstrate the potential of deep learning for improved visual localization.
Abstract
Feature matching is a necessary step for many computer vision and photogrammetry applications such as image registration, structure-from-motion, and visual localization. Classical handcrafted methods such as SIFT feature detection and description combined with nearest neighbour matching and RANSAC outlier removal have been state-of-the-art for mobile mapping cameras. With recent advances in deep learning, learnable methods have been introduced and proven to have better robustness and performance under complex conditions. Despite their growing adoption, a comprehensive comparison between classical and learnable feature matching methods for the specific task of semantic 3D building camera-to-model matching is still missing. This submission systematically evaluates the effectiveness of different feature-matching techniques in visual localization using textured CityGML LoD2 models. We use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
