BMIP: Bi-directional Modality Interaction Prompt Learning for VLM
Song-Lin Lv, Yu-Yang Chen, Zhi Zhou, Ming Yang, Lan-Zhe Guo

TL;DR
This paper introduces BMIP, a novel bi-directional modality interaction prompt learning method for vision-language models that enhances inter-modal alignment and generalization across diverse tasks.
Contribution
The paper proposes BMIP, a new prompt learning approach that dynamically weights bi-modal information for improved inter-modal interaction and generalization in VLMs.
Findings
BMIP outperforms state-of-the-art methods across multiple evaluation paradigms.
BMIP enhances inter-modal consistency and trainability.
Flexible integration with other prompt methods improves performance.
Abstract
Vision-language models (VLMs) have exhibited remarkable generalization capabilities, and prompt learning for VLMs has attracted great attention for the ability to adapt pre-trained VLMs to specific downstream tasks. However, existing studies mainly focus on single-modal prompts or uni-directional modality interaction, overlooking the powerful alignment effects resulting from the interaction between the vision and language modalities. To this end, we propose a novel prompt learning method called , which dynamically weights bi-modal information through learning the information of the attention layer, enhancing trainability and inter-modal consistency compared to simple information aggregation methods. To evaluate the effectiveness of prompt learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control · Power Transformer Diagnostics and Insulation · EEG and Brain-Computer Interfaces
MethodsSoftmax · Attention Is All You Need · Focus
