SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation   Models to Any Imaging Modality

Chenyang Lei; Liyi Chen; Jun Cen; Xiao Chen; Zhen Lei; Felix Heide,; Qifeng Chen; Zhaoxiang Zhang

arXiv:2411.18669·cs.CV·December 2, 2024

SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality

Chenyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide,, Qifeng Chen, Zhaoxiang Zhang

PDF

Open Access 1 Repo

TL;DR

SimCMF introduces a straightforward cross-modal fine-tuning framework that adapts vision foundation models trained on RGB images to various other imaging modalities, significantly enhancing segmentation performance across different sensors.

Contribution

The paper proposes a novel cross-modal alignment module and constructs a new benchmark for evaluating performance transfer from RGB to other imaging modalities.

Findings

01

Improves segmentation mIoU from 22.15% to 53.88% on average across modalities.

02

Outperforms existing baseline methods in cross-modal transfer tasks.

03

Demonstrates the potential of foundation models in sensor data enhancement.

Abstract

Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. However, it is extremely challenging for sensors in many different fields to collect similar scales of natural images to train strong foundation models. To this end, this work presents a simple and effective framework, SimCMF, to study an important problem: cross-modal fine-tuning from vision foundation models trained on natural RGB images to other imaging modalities of different physical properties (e.g., polarization). In SimCMF, we conduct a thorough analysis of different basic components from the most naive design and ultimately propose a novel cross-modal alignment module to address the modality misalignment problem. We apply SimCMF to a representative vision foundation model Segment Anything Model (SAM) to support any evaluated new imaging modality. Given the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mt-cly/simcmf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques · Image Retrieval and Classification Techniques