Leveraging Semantic Cues from Foundation Vision Models for Enhanced   Local Feature Correspondence

Felipe Cadar; Guilherme Potje; Renato Martins; C\'edric; Demonceaux; Erickson R. Nascimento

arXiv:2410.09533·cs.CV·October 15, 2024

Leveraging Semantic Cues from Foundation Vision Models for Enhanced Local Feature Correspondence

Felipe Cadar, Guilherme Potje, Renato Martins, C\'edric, Demonceaux, Erickson R. Nascimento

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach that leverages semantic cues from foundation vision models to improve local feature matching in computer vision tasks, enabling faster inference without needing image pairs.

Contribution

It proposes a method to incorporate semantic reasoning into existing descriptors, improving performance and enabling fast, pair-independent matching.

Findings

01

Average 29% improvement in camera localization accuracy

02

Comparable accuracy to LightGlue and LoFTR on benchmarks

03

Descriptors can be cached for fast similarity search

Abstract

Visual correspondence is a crucial step in key computer vision tasks, including camera localization, image registration, and structure from motion. The most effective techniques for matching keypoints currently involve using learned sparse or dense matchers, which need pairs of images. These neural networks have a good general understanding of features from both images, but they often struggle to match points from different semantic areas. This paper presents a new method that uses semantic cues from foundation vision model features (like DINOv2) to enhance local feature matching by incorporating semantic reasoning into existing descriptors. Therefore, the learned descriptors do not require image pairs at inference time, allowing feature caching and fast matching using similarity search, unlike learned matchers. We present adapted versions of six existing descriptors, with an average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

verlab/DescriptorReasoning_ACCV_2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications