Low-rank Prompt Interaction for Continual Vision-Language Retrieval
Weicai Yan, Ye Wang, Wang Lin, Zirun Guo, Zhou Zhao, Tao Jin

TL;DR
This paper introduces Low-rank Prompt Interaction (LPI), a novel method for continual vision-language retrieval that explicitly models cross-modal and cross-task interactions using low-rank decomposition and contrastive learning, improving performance with minimal parameters.
Contribution
The paper proposes a new low-rank prompt interaction framework that explicitly models cross-modal and cross-task interactions in continual learning for vision-language retrieval.
Findings
Performance improvements on two retrieval tasks
Effective with minimal additional parameters
Enhanced cross-modal and cross-task understanding
Abstract
Research on continual learning in multi-modal tasks has been receiving increasing attention. However, most existing work overlooks the explicit cross-modal and cross-task interactions. In this paper, we innovatively propose the Low-rank Prompt Interaction (LPI) to address this general problem of multi-modal understanding, which considers both cross-modal and cross-task interactions. Specifically, as for the former, we employ multi-modal correlation modules for corresponding Transformer layers. Considering that the training parameters scale to the number of layers and tasks, we propose low-rank interaction-augmented decomposition to avoid memory explosion while enhancing the cross-modal association through sharing and separating common-specific low-rank factors. In addition, due to the multi-modal semantic differences carried by the low-rank initialization, we adopt hierarchical low-rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsAttention Is All You Need · Softmax · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · ADaptive gradient method with the OPTimal convergence rate · Multi-Head Attention · Position-Wise Feed-Forward Layer
