Understanding and Optimizing Attention-Based Sparse Matching for Diverse Local Features
Qiang Wang

TL;DR
This paper analyzes attention-based sparse image matching models, identifies key overlooked design choices, and proposes a universal, detector-agnostic fine-tuning approach that enhances performance across diverse local features.
Contribution
It reveals the primary impact of detector choices over descriptors in transformer-based matching and introduces a fine-tuning method for universal application.
Findings
Detector choice significantly affects performance.
Fine-tuning with diverse keypoints improves universality.
Zero-shot matching achieves or surpasses specialized models.
Abstract
We revisit the problem of training attention-based sparse image matching models for various local features. We first identify one critical design choice that has been previously overlooked, which significantly impacts the performance of the LightGlue model. We then investigate the role of detectors and descriptors within the transformer-based matching framework, finding that detectors, rather than descriptors, are often the primary cause for performance difference. Finally, we propose a novel approach to fine-tune existing image matching models using keypoints from a diverse set of detectors, resulting in a universal, detector-agnostic model. When deployed as a zero-shot matcher for novel detectors, the resulting model achieves or exceeds the accuracy of models specifically trained for those features. Our findings offer valuable insights for the deployment of transformer-based matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Multimodal Machine Learning Applications
