SimMAT: Exploring Transferability from Vision Foundation Models to Any   Image Modality

Chenyang Lei; Liyi Chen; Jun Cen; Xiao Chen; Zhen Lei; Felix Heide,; Ziwei Liu; Qifeng Chen; Zhaoxiang Zhang

arXiv:2409.08083·cs.CV·September 13, 2024

SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality

Chenyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide,, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SimMAT, a framework that enables the transfer of vision foundation models trained on RGB images to other image modalities, demonstrating significant improvements in segmentation performance across various sensors.

Contribution

The paper presents SimMAT, a novel modality-agnostic transfer layer that facilitates cross-modal transfer from RGB-trained foundation models to diverse image modalities.

Findings

01

SimMAT improves segmentation mIoU from 22.15% to 53.88% on average across modalities.

02

SimMAT outperforms baseline methods in cross-modal transfer tasks.

03

Constructed a new benchmark for evaluating transfer learning to non-RGB image modalities.

Abstract

Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. However, it is extremely challenging for sensors in many different fields to collect similar scales of natural images to train strong foundation models. To this end, this work presents a simple and effective framework SimMAT to study an open problem: the transferability from vision foundation models trained on natural RGB images to other image modalities of different physical properties (e.g., polarization). SimMAT consists of a modality-agnostic transfer layer (MAT) and a pretrained foundation model. We apply SimMAT to a representative vision foundation model Segment Anything Model (SAM) to support any evaluated new image modality. Given the absence of relevant benchmarks, we construct a new benchmark to evaluate the transfer learning performance. Our experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mt-cly/simmat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSatellite Image Processing and Photogrammetry