MINIMA: Modality Invariant Image Matching

Jiangwei Ren; Xingyu Jiang; Zizhuo Li; Dingkang Liang; Xin Zhou; Xiang; Bai

arXiv:2412.19412·cs.CV·April 1, 2025

MINIMA: Modality Invariant Image Matching

Jiangwei Ren, Xingyu Jiang, Zizhuo Li, Dingkang Liang, Xin Zhou, Xiang, Bai

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

MINIMA introduces a scalable data generation approach and a new dataset to improve universal image matching across multiple modalities, significantly outperforming existing methods in diverse cross-modal scenarios.

Contribution

The paper presents a unified framework and a large-scale multimodal dataset, enabling improved generalization and performance in cross-modal image matching tasks.

Findings

01

MINIMA outperforms baseline methods in in-domain and zero-shot cross-modal matching.

02

The generated MD-syn dataset effectively transfers RGB data diversity to multimodal matching.

03

The approach achieves superior results across 19 different cross-modal cases.

Abstract

Image matching for both cross-view and cross-modality plays a critical role in multimodal perception. In practice, the modality gap caused by different imaging systems/styles poses great challenges to the matching task. Existing works try to extract invariant features for specific modalities and train on limited datasets, showing poor generalization. In this paper, we present MINIMA, a unified image matching framework for multiple cross-modal cases. Without pursuing fancy modules, our MINIMA aims to enhance universal performance from the perspective of data scaling up. For such purpose, we propose a simple yet effective data engine that can freely produce a large dataset containing multiple modalities, rich scenarios, and accurate matching labels. Specifically, we scale up the modalities from cheap but rich RGB-only matching data, by means of generative models. Under this setting, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LSXI7/MINIMA
pytorchOfficial

Models

🤗
lsxi77777/MINIMA
model· ♡ 4
♡ 4

Datasets

lsxi77777/MegaDepth-Syn
dataset· 2.1k dl
2.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Medical Image Segmentation Techniques