TL;DR
This paper introduces an interpretable, scene-specific image embedding that efficiently predicts the 3D surface overlap between images, reducing search complexity and enabling faster, human-interpretable image matching.
Contribution
The authors propose a novel non-metric box embedding that captures asymmetric image relations and scene-specific similarity, improving efficiency and interpretability over traditional geometric verification methods.
Findings
Achieves competitive image-matching accuracy
Faster and simpler than existing geometric verification methods
Provides human-interpretable insights into image relations
Abstract
To what extent are two images picturing the same 3D surfaces? Even when this is a known scene, the answer typically requires an expensive search across scale space, with matching and geometric verification of large sets of local features. This expense is further multiplied when a query image is evaluated against a gallery, e.g. in visual relocalization. While we don't obviate the need for geometric verification, we propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup. Our approach measures the asymmetric relation between two images. The model then learns a scene-specific measure of similarity, from training examples with known 3D visible-surface overlaps. The result is that we can quickly identify, for example, which test image is a close-up version of another, and by what scale factor. Subsequently, local features need only be detected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
