BMIP: Bi-directional Modality Interaction Prompt Learning for VLM

Song-Lin Lv; Yu-Yang Chen; Zhi Zhou; Ming Yang; Lan-Zhe Guo

arXiv:2501.07769·cs.LG·January 15, 2025

BMIP: Bi-directional Modality Interaction Prompt Learning for VLM

Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Ming Yang, Lan-Zhe Guo

PDF

Open Access

TL;DR

This paper introduces BMIP, a novel bi-directional modality interaction prompt learning method for vision-language models that enhances inter-modal alignment and generalization across diverse tasks.

Contribution

The paper proposes BMIP, a new prompt learning approach that dynamically weights bi-modal information for improved inter-modal interaction and generalization in VLMs.

Findings

01

BMIP outperforms state-of-the-art methods across multiple evaluation paradigms.

02

BMIP enhances inter-modal consistency and trainability.

03

Flexible integration with other prompt methods improves performance.

Abstract

Vision-language models (VLMs) have exhibited remarkable generalization capabilities, and prompt learning for VLMs has attracted great attention for the ability to adapt pre-trained VLMs to specific downstream tasks. However, existing studies mainly focus on single-modal prompts or uni-directional modality interaction, overlooking the powerful alignment effects resulting from the interaction between the vision and language modalities. To this end, we propose a novel prompt learning method called $\underline{B} i - d i r ec t i o na l \underline{M} o d a l i t y \underline{I} n t er a c t i o n \underline{P} r o m pt (B M I P)$ , which dynamically weights bi-modal information through learning the information of the attention layer, enhancing trainability and inter-modal consistency compared to simple information aggregation methods. To evaluate the effectiveness of prompt learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control · Power Transformer Diagnostics and Insulation · EEG and Brain-Computer Interfaces

MethodsSoftmax · Attention Is All You Need · Focus