LVP-CLIP:Revisiting CLIP for Continual Learning with Label Vector Pool
Yue Ma, Huantao Ren, Boyu Wang, Jingang Jin, Senem Velipasalar, Qinru, Qiu

TL;DR
This paper introduces Label Vector Pool (LVP), a novel approach for continual learning with CLIP that replaces text labels with training images as references, reducing reliance on text quality and minimizing forgetting.
Contribution
The paper proposes LVP, a new method that improves CLIP-based continual learning by using training images as similarity references, enabling task-order invariance and reducing forgetting.
Findings
LVP-based methods outperform state-of-the-art by 40.7%
LVP reduces dependence on text label quality
LVP enables task-order invariant learning with low computational demands
Abstract
Continual learning aims to update a model so that it can sequentially learn new tasks without forgetting previously acquired knowledge. Recent continual learning approaches often leverage the vision-language model CLIP for its high-dimensional feature space and cross-modality feature matching. Traditional CLIP-based classification methods identify the most similar text label for a test image by comparing their embeddings. However, these methods are sensitive to the quality of text phrases and less effective for classes lacking meaningful text labels. In this work, we rethink CLIP-based continual learning and introduce the concept of Label Vector Pool (LVP). LVP replaces text labels with training images as similarity references, eliminating the need for ideal text descriptions. We present three variations of LVP and evaluate their performance on class and domain incremental learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Music and Audio Processing
MethodsContrastive Language-Image Pre-training
