Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence
Felipe Cadar, Guilherme Potje, Renato Martins, C\'edric, Demonceaux, Erickson R. Nascimento

TL;DR
This paper introduces a novel approach that leverages semantic cues from foundation vision models to improve local feature matching in computer vision tasks, enabling faster inference without needing image pairs.
Contribution
It proposes a method to incorporate semantic reasoning into existing descriptors, improving performance and enabling fast, pair-independent matching.
Findings
Average 29% improvement in camera localization accuracy
Comparable accuracy to LightGlue and LoFTR on benchmarks
Descriptors can be cached for fast similarity search
Abstract
Visual correspondence is a crucial step in key computer vision tasks, including camera localization, image registration, and structure from motion. The most effective techniques for matching keypoints currently involve using learned sparse or dense matchers, which need pairs of images. These neural networks have a good general understanding of features from both images, but they often struggle to match points from different semantic areas. This paper presents a new method that uses semantic cues from foundation vision model features (like DINOv2) to enhance local feature matching by incorporating semantic reasoning into existing descriptors. Therefore, the learned descriptors do not require image pairs at inference time, allowing feature caching and fast matching using similarity search, unlike learned matchers. We present adapted versions of six existing descriptors, with an average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
