Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
Haixin Wang, Xinlong Yang, Jianlong Chang, Dian Jin, Jinan Sun, Shikun, Zhang, Xiao Luo, Qi Tian

TL;DR
This paper introduces Aurora, a lightweight prompt tuning framework for large-scale multimodal models that significantly reduces parameters while improving modality alignment and outperforming full fine-tuning on multiple benchmarks.
Contribution
Aurora proposes a low-parameter multimodal prompt tuning method with novel modules for enhanced modality alignment, addressing efficiency and effectiveness in cross-modal tasks.
Findings
Outperforms state-of-the-art methods on six benchmarks.
Uses only 0.04% of pre-trained model parameters.
Surpasses full fine-tuning results in experiments.
Abstract
Driven by the progress of large-scale pre-training, parameter-efficient transfer learning has gained immense popularity across different subfields of Artificial Intelligence. The core is to adapt the model to downstream tasks with only a small set of parameters. Recently, researchers have leveraged such proven techniques in multimodal tasks and achieve promising results. However, two critical issues remain unresolved: how to further reduce the complexity with lightweight design and how to boost alignment between modalities under extremely low parameters. In this paper, we propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges. Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning, which explores the low intrinsic dimension…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
