Memory-Inspired Temporal Prompt Interaction for Text-Image Classification
Xinyao Yu, Hao Sun, Ziwei Niu, Rui Qin, Zhenjia Bai, Yen-Wei Chen,, Lanfen Lin

TL;DR
This paper introduces Memory-Inspired Temporal Prompt Interaction (MITP), a novel prompt-based strategy for multimodal models that enhances efficiency and interaction between vision and language modalities inspired by human memory processes.
Contribution
It proposes a new memory-inspired prompt interaction method that improves multimodal learning efficiency with minimal additional parameters and memory, inspired by human memory stages.
Findings
Achieves competitive results on multiple datasets.
Uses only about 1% of the pre-trained model's parameters.
Reduces memory and computational costs while maintaining performance.
Abstract
In recent years, large-scale pre-trained multimodal models (LMM) generally emerge to integrate the vision and language modalities, achieving considerable success in various natural language processing and computer vision tasks. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy is studied to align modalities more efficiently. In this contex, we propose a novel prompt-based multimodal interaction strategy inspired by human memory strategy, namely Memory-Inspired Temporal Prompt Interaction (MITP). Our proposed method involves in two stages as in human memory strategy: the acquiring stage, and the consolidation and activation stage. We utilize temporal prompts on intermediate layers to imitate the acquiring stage, leverage similarity-based prompt interaction to imitate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text and Document Classification Technologies
MethodsALIGN
