Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning

Tao Hu,Da-Wei Zhou

arXiv:2605.10765·cs.CV·May 12, 2026

Dynamic Cross-Modal Prompt Generation for Multimodal Continual Instruction Tuning

Tao Hu,Da-Wei Zhou

PDF

TL;DR

This paper introduces DRAPE, a novel prompt-learning framework that generates instance-specific prompts for multimodal continual instruction tuning, improving adaptability and reducing forgetting in large language models.

Contribution

DRAPE synthesizes cross-modal, instance-specific prompts using a novel query-based approach, advancing continual learning in multimodal large language models.

Findings

01

DRAPE achieves state-of-the-art performance on MCIT benchmarks.

02

It effectively mitigates catastrophic forgetting during sequential task learning.

03

Instance-specific prompt generation outperforms task-level prompt methods.

Abstract

Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, yet real-world deployment often requires continual capability expansion across sequential tasks. In such scenarios, Multimodal Continual Instruction Tuning (MCIT) aims to acquire new capabilities while limiting catastrophic forgetting. Existing methods mainly follow a module-composition paradigm: they maintain task-level prompts or LoRA experts and dynamically route or aggregate a subset of them at inference. However, samples within the same task can still differ substantially in visual scenes, question intents, and reasoning demands. This motivates instance-level adaptation to individual query-image pairs rather than only selecting or combining task-level modules. To this end, we propose DRAPE (Dynamic Cross-Modal Prompt Generation), a prompt-learning framework that synthesizes continuous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.