ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP

Zhiyuan Wang; Bokui Chen

arXiv:2506.19608·cs.AI·September 4, 2025

ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP

Zhiyuan Wang, Bokui Chen

PDF

TL;DR

ChordPrompt enhances continual learning in vision-language models by enabling cross-modal prompt interactions and domain adaptation, significantly improving performance across multiple domains without extensive retraining.

Contribution

The paper introduces ChordPrompt, a novel framework that leverages cross-modal prompts and domain-adaptive text prompts for multi-domain incremental learning in CLIP.

Findings

01

Outperforms state-of-the-art methods in zero-shot generalization.

02

Achieves superior downstream task performance across multiple domains.

03

Demonstrates effective cross-modal prompt synergy in continual learning.

Abstract

Continual learning (CL) empowers pre-trained vision-language models to adapt effectively to novel or previously underrepresented data distributions without comprehensive retraining, enhancing their adaptability and efficiency. While vision-language models like CLIP show great promise, they struggle to maintain performance across domains in incremental learning scenarios. Existing prompt learning methods face two main limitations: 1) they primarily focus on class-incremental learning scenarios, lacking specific strategies for multi-domain task incremental learning; 2) most current approaches employ single-modal prompts, neglecting the potential benefits of cross-modal information exchange. To address these challenges, we propose the \ChordPrompt framework, which facilitates a harmonious interplay between visual and textual prompts. \ChordPrompt introduces cross-modal prompts to leverage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.