MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning

Yi Xin; Junlong Du; Qiang Wang; Ke Yan; Shouhong Ding

arXiv:2312.08636·cs.CV·December 15, 2023·5 cites

MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning

Yi Xin, Junlong Du, Qiang Wang, Ke Yan, Shouhong Ding

PDF

Open Access

TL;DR

This paper introduces MmAP, a multi-modal alignment prompt for CLIP that enhances multi-task learning by aligning text and visual modalities, enabling efficient parameter use and improved performance across tasks.

Contribution

The paper proposes a novel multi-modal alignment prompt (MmAP) for CLIP, enabling effective multi-task learning with minimal trainable parameters and preserving task-specific features.

Findings

01

Achieves significant performance gains over full fine-tuning.

02

Uses only approximately 0.09% of trainable parameters.

03

Demonstrates effectiveness on large multi-task datasets.

Abstract

Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists of a shared backbone and task-specific decoders. However, the complexity of the decoders increases with the number of tasks. To tackle this challenge, we integrate the decoder-free vision-language model CLIP, which exhibits robust zero-shot generalization capability. Recently, parameter-efficient transfer learning methods have been extensively explored with CLIP for adapting to downstream tasks, where prompt tuning showcases strong potential. Nevertheless, these methods solely fine-tune a single modality (text or visual), disrupting the modality structure of CLIP. In this paper, we first propose Multi-modal Alignment Prompt (MmAP) for CLIP, which aligns text and visual modalities during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training