Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models
Yicheng Xu, Yuxin Chen, Jiahao Nie, Yusong Wang, Huiping Zhuang,, Manabu Okumura

TL;DR
This paper introduces RAIL, a novel continual learning method for vision-language models that preserves zero-shot capabilities across domains without extra data, and proposes the new X-TAIL setting for domain-agnostic incremental learning.
Contribution
The paper presents RAIL, a recursive ridge regression adapter that decouples cross-domain correlations and maintains zero-shot abilities without reference data, and introduces the X-TAIL setting for domain-agnostic continual learning.
Findings
RAIL achieves state-of-the-art results in X-TAIL and multi-domain task-incremental learning.
RAIL preserves zero-shot abilities on unseen domains without reference datasets.
Theoretical proof of RAIL's absolute memorization on learned domains.
Abstract
Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsAdapter
