Low-rank Prompt Interaction for Continual Vision-Language Retrieval

Weicai Yan; Ye Wang; Wang Lin; Zirun Guo; Zhou Zhao; Tao Jin

arXiv:2501.14369·cs.CV·January 27, 2025

Low-rank Prompt Interaction for Continual Vision-Language Retrieval

Weicai Yan, Ye Wang, Wang Lin, Zirun Guo, Zhou Zhao, Tao Jin

PDF

Open Access 1 Repo

TL;DR

This paper introduces Low-rank Prompt Interaction (LPI), a novel method for continual vision-language retrieval that explicitly models cross-modal and cross-task interactions using low-rank decomposition and contrastive learning, improving performance with minimal parameters.

Contribution

The paper proposes a new low-rank prompt interaction framework that explicitly models cross-modal and cross-task interactions in continual learning for vision-language retrieval.

Findings

01

Performance improvements on two retrieval tasks

02

Effective with minimal additional parameters

03

Enhanced cross-modal and cross-task understanding

Abstract

Research on continual learning in multi-modal tasks has been receiving increasing attention. However, most existing work overlooks the explicit cross-modal and cross-task interactions. In this paper, we innovatively propose the Low-rank Prompt Interaction (LPI) to address this general problem of multi-modal understanding, which considers both cross-modal and cross-task interactions. Specifically, as for the former, we employ multi-modal correlation modules for corresponding Transformer layers. Considering that the training parameters scale to the number of layers and tasks, we propose low-rank interaction-augmented decomposition to avoid memory explosion while enhancing the cross-modal association through sharing and separating common-specific low-rank factors. In addition, due to the multi-modal semantic differences carried by the low-rank initialization, we adopt hierarchical low-rank…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kelvin-ywc/lpi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsAttention Is All You Need · Softmax · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · ADaptive gradient method with the OPTimal convergence rate · Multi-Head Attention · Position-Wise Feed-Forward Layer