Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation
Lincan Cai, Shuang Li, Wenxuan Ma, Jingxuan Kang, Binhui Xie, Zixun, Sun, Chengwei Zhu

TL;DR
This paper introduces PaRe, a novel end-to-end method that uses gradual intermediate modality generation to improve cross-modal fine-tuning, especially for data-scarce and highly discrepant modalities, by bridging modality gaps effectively.
Contribution
PaRe is the first approach to employ a gating mechanism and patch replacement for constructing intermediate modalities, enhancing transfer stability and addressing data scarcity in cross-modal fine-tuning.
Findings
Outperforms existing methods on three benchmarks
Effectively bridges modality gaps with intermediate data
Improves transfer stability and data efficiency
Abstract
Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal fine-tuning, aiming to transfer a large-scale pretrained model to various target modalities. PaRe employs a gating mechanism to select key patches from both source and target data. Through a modality-agnostic Patch Replacement scheme, these patches are preserved and combined to construct data-rich intermediate modalities ranging from easy to hard. By gradually intermediate modality generation, we can not only effectively bridge the modality gap to enhance stability and transferability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotonic and Optical Devices · Advanced Optical Imaging Technologies
