TL;DR
This paper introduces a large-scale pre-training framework for cross-modality image matching, enabling models to generalize across diverse imaging modalities and outperform existing methods in multiple unseen tasks.
Contribution
The authors propose a synthetic cross-modal training approach that significantly improves the generalization of image matching models across various modalities.
Findings
Model trained with our framework generalizes well to over eight unseen cross-modality tasks.
Our approach outperforms existing generalization and task-specific methods.
The method enhances multi-modality analysis in scientific and AI applications.
Abstract
Image matching, which aims to identify corresponding pixel locations between images, is crucial in a wide range of scientific disciplines, aiding in image registration, fusion, and analysis. In recent years, deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences. However, when dealing with images captured under different imaging modalities that result in significant appearance changes, the performance of these algorithms often deteriorates due to the scarcity of annotated cross-modal training data. This limitation hinders applications in various fields that rely on multiple image modalities to obtain complementary information. To address this challenge, we propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals, incorporating diverse data from various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
