Mind the Interference: Retaining Pre-trained Knowledge in Parameter   Efficient Continual Learning of Vision-Language Models

Longxiang Tang; Zhuotao Tian; Kai Li; Chunming He; Hantao Zhou,; Hengshuang Zhao; Xiu Li; Jiaya Jia

arXiv:2407.05342·cs.CV·July 9, 2024

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou,, Hengshuang Zhao, Xiu Li, Jiaya Jia

PDF

Open Access 1 Repo

TL;DR

This paper introduces DIKI, a novel framework for continual learning in vision-language models that preserves pre-trained knowledge efficiently by avoiding interference, enabling better adaptation to diverse tasks with minimal parameter updates.

Contribution

The study proposes a distribution-aware, interference-free knowledge integration method that retains pre-trained VLM knowledge during continual learning, reducing computational costs and parameter updates.

Findings

01

Outperforms state-of-the-art with only 0.86% of parameters trained

02

Requires significantly less training time than existing methods

03

Effectively preserves zero-shot capabilities across tasks

Abstract

This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability. Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy computation overhead. To address this problem efficiently, we propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of VLMs from a perspective of avoiding information interference. Specifically, we design a fully residual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lloongx/diki
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsKnowledge Distillation