Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Haodong Lu; Xinyu Zhang; Kristen Moore; Jason Xue; Lina Yao; Anton van den Hengel; Dong Gong

arXiv:2505.20680·cs.CV·December 22, 2025

Continual Learning on CLIP via Incremental Prompt Tuning with Intrinsic Textual Anchors

Haodong Lu, Xinyu Zhang, Kristen Moore, Jason Xue, Lina Yao, Anton van den Hengel, Dong Gong

PDF

Open Access

TL;DR

This paper introduces a novel incremental prompt tuning method for CLIP, called TPPT, which uses textual prototypes as stable anchors to improve continual learning by reducing forgetting and enhancing adaptation.

Contribution

It proposes a concise, intrinsic CLIP-based continual learning approach leveraging textual prototypes and bidirectional supervision, with regularization to prevent embedding collapse.

Findings

01

Effective in reducing catastrophic forgetting.

02

Improves continual adaptation performance.

03

Leverages CLIP's multi-modal structure effectively.

Abstract

Continual learning (CL) enables deep networks to acquire new knowledge while avoiding catastrophic forgetting. The powerful generalization ability of pre-trained models (PTMs), such as the Contrastive Language-Image Pre-training (CLIP) model, has inspired a range of CL methods targeting new and specialized tasks, providing rich multi-modal embeddings that support lightweight, incremental prompt tuning. Existing methods often rely on complex designs built upon specific assumptions, such as intricate regularization schemes for prompt pools, specialized routing mechanisms, or multi-stage incrementations, that introduce additional-and possibly unnecessary-complexity, underutilizing CLIP's intrinsic capabilities. In this paper, we propose a concise CL approach for CLIP based on incremental prompt tuning that fully exploits its multi-modal structure and the stability of textual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Text and Document Classification Technologies · Natural Language Processing Techniques