LVP-CLIP:Revisiting CLIP for Continual Learning with Label Vector Pool

Yue Ma; Huantao Ren; Boyu Wang; Jingang Jin; Senem Velipasalar; Qinru; Qiu

arXiv:2412.05840·cs.CV·December 10, 2024

LVP-CLIP:Revisiting CLIP for Continual Learning with Label Vector Pool

Yue Ma, Huantao Ren, Boyu Wang, Jingang Jin, Senem Velipasalar, Qinru, Qiu

PDF

Open Access

TL;DR

This paper introduces Label Vector Pool (LVP), a novel approach for continual learning with CLIP that replaces text labels with training images as references, reducing reliance on text quality and minimizing forgetting.

Contribution

The paper proposes LVP, a new method that improves CLIP-based continual learning by using training images as similarity references, enabling task-order invariance and reducing forgetting.

Findings

01

LVP-based methods outperform state-of-the-art by 40.7%

02

LVP reduces dependence on text label quality

03

LVP enables task-order invariant learning with low computational demands

Abstract

Continual learning aims to update a model so that it can sequentially learn new tasks without forgetting previously acquired knowledge. Recent continual learning approaches often leverage the vision-language model CLIP for its high-dimensional feature space and cross-modality feature matching. Traditional CLIP-based classification methods identify the most similar text label for a test image by comparing their embeddings. However, these methods are sensitive to the quality of text phrases and less effective for classes lacking meaningful text labels. In this work, we rethink CLIP-based continual learning and introduce the concept of Label Vector Pool (LVP). LVP replaces text labels with training images as similarity references, eliminating the need for ideal text descriptions. We present three variations of LVP and evaluate their performance on class and domain incremental learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning · Music and Audio Processing

MethodsContrastive Language-Image Pre-training