Learning Modality Knowledge Alignment for Cross-Modality Transfer

Wenxuan Ma; Shuang Li; Lincan Cai; Jingxuan Kang

arXiv:2406.18864·cs.CV·June 28, 2024·2 cites

Learning Modality Knowledge Alignment for Cross-Modality Transfer

Wenxuan Ma, Shuang Li, Lincan Cai, Jingxuan Kang

PDF

Open Access

TL;DR

This paper investigates how the gap between different data modalities affects transfer learning and introduces MoNA, a meta-learning method to align modality knowledge, improving cross-modality transfer performance.

Contribution

It formalizes the modality gap as knowledge misalignment and proposes MoNA to reduce this gap through target data transformation, enhancing transfer effectiveness.

Findings

01

Larger modality gaps lead to less effective knowledge transfer.

02

MoNA improves knowledge reuse in cross-modality transfer.

03

Experimental results outperform existing finetuning methods.

Abstract

Cross-modality transfer aims to leverage large pretrained models to complete tasks that may not belong to the modality of pretraining data. Existing works achieve certain success in extending classical finetuning to cross-modal scenarios, yet we still lack understanding about the influence of modality gap on the transfer. In this work, a series of experiments focusing on the source representation quality during transfer are conducted, revealing the connection between larger modality gap and lesser knowledge reuse which means ineffective transfer. We then formalize the gap as the knowledge misalignment between modalities using conditional distribution P(Y|X). Towards this problem, we present Modality kNowledge Alignment (MoNA), a meta-learning approach that learns target data transformation to reduce the modality knowledge discrepancy ahead of the transfer. Experiments show that out…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems