ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP
Zhiyuan Wang, Bokui Chen

TL;DR
ChordPrompt enhances continual learning in vision-language models by enabling cross-modal prompt interactions and domain adaptation, significantly improving performance across multiple domains without extensive retraining.
Contribution
The paper introduces ChordPrompt, a novel framework that leverages cross-modal prompts and domain-adaptive text prompts for multi-domain incremental learning in CLIP.
Findings
Outperforms state-of-the-art methods in zero-shot generalization.
Achieves superior downstream task performance across multiple domains.
Demonstrates effective cross-modal prompt synergy in continual learning.
Abstract
Continual learning (CL) empowers pre-trained vision-language models to adapt effectively to novel or previously underrepresented data distributions without comprehensive retraining, enhancing their adaptability and efficiency. While vision-language models like CLIP show great promise, they struggle to maintain performance across domains in incremental learning scenarios. Existing prompt learning methods face two main limitations: 1) they primarily focus on class-incremental learning scenarios, lacking specific strategies for multi-domain task incremental learning; 2) most current approaches employ single-modal prompts, neglecting the potential benefits of cross-modal information exchange. To address these challenges, we propose the \ChordPrompt framework, which facilitates a harmonious interplay between visual and textual prompts. \ChordPrompt introduces cross-modal prompts to leverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
