SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Chenyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide,, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang

TL;DR
This paper introduces SimMAT, a framework that enables the transfer of vision foundation models trained on RGB images to other image modalities, demonstrating significant improvements in segmentation performance across various sensors.
Contribution
The paper presents SimMAT, a novel modality-agnostic transfer layer that facilitates cross-modal transfer from RGB-trained foundation models to diverse image modalities.
Findings
SimMAT improves segmentation mIoU from 22.15% to 53.88% on average across modalities.
SimMAT outperforms baseline methods in cross-modal transfer tasks.
Constructed a new benchmark for evaluating transfer learning to non-RGB image modalities.
Abstract
Foundation models like ChatGPT and Sora that are trained on a huge scale of data have made a revolutionary social impact. However, it is extremely challenging for sensors in many different fields to collect similar scales of natural images to train strong foundation models. To this end, this work presents a simple and effective framework SimMAT to study an open problem: the transferability from vision foundation models trained on natural RGB images to other image modalities of different physical properties (e.g., polarization). SimMAT consists of a modality-agnostic transfer layer (MAT) and a pretrained foundation model. We apply SimMAT to a representative vision foundation model Segment Anything Model (SAM) to support any evaluated new image modality. Given the absence of relevant benchmarks, we construct a new benchmark to evaluate the transfer learning performance. Our experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSatellite Image Processing and Photogrammetry
