MIFNet: Learning Modality-Invariant Features for Generalizable Multimodal Image Matching
Yepeng Liu, Zhichao Sun, Baosheng Yu, Yitian Zhao, Bo Du, Yongchao Xu, Jun Cheng

TL;DR
MIFNet is a novel neural network that learns modality-invariant features for multimodal image matching using only single-modality training data, enabling robust zero-shot performance across diverse modalities.
Contribution
The paper introduces MIFNet, a new approach that leverages pre-trained features and novel modules to learn modality-invariant descriptors without requiring multimodal training data.
Findings
MIFNet achieves strong zero-shot generalization on multiple multimodal datasets.
The method outperforms existing approaches in multimodal image matching tasks.
It effectively leverages pre-trained models to enhance robustness across modalities.
Abstract
Many keypoint detection and description methods have been proposed for image matching or registration. While these methods demonstrate promising performance for single-modality image matching, they often struggle with multimodal data because the descriptors trained on single-modality data tend to lack robustness against the non-linear variations present in multimodal data. Extending such methods to multimodal image matching often requires well-aligned multimodal data to learn modality-invariant descriptors. However, acquiring such data is often costly and impractical in many real-world scenarios. To address this challenge, we propose a modality-invariant feature learning network (MIFNet) to compute modality-invariant features for keypoint descriptions in multimodal image matching using only single-modality training data. Specifically, we propose a novel latent feature aggregation module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsDiffusion · Balanced Selection
