Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning

Qifan Zhang; Yunhui Guo; Yu Xiang

arXiv:2407.13911·cs.CV·May 21, 2025

Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning

Qifan Zhang, Yunhui Guo, Yu Xiang

PDF

Open Access

TL;DR

This paper proposes a novel knowledge distillation method called KDP for prompt-based continual learning, improving efficiency and performance by distilling from large to small vision transformers.

Contribution

Introduces KDP, a new prompt-based knowledge distillation technique tailored for continual learning with vision transformers.

Findings

01

KDP outperforms existing KD methods in CDL scenarios.

02

Global prompts enhance knowledge transfer effectiveness.

03

Distillation improves inference efficiency in prompt-based CL models.

Abstract

We introduce the problem of continual distillation learning (CDL) in order to use knowledge distillation (KD) to improve prompt-based continual learning (CL) models. The CDL problem is valuable to study since the use of a larger vision transformer (ViT) leads to better performance in prompt-based continual learning. The distillation of knowledge from a large ViT to a small ViT improves the inference efficiency for prompt-based CL models. We empirically found that existing KD methods such as logit distillation and feature distillation cannot effectively improve the student model in the CDL setup. To address this issue, we introduce a novel method named Knowledge Distillation based on Prompts (KDP), in which globally accessible prompts specifically designed for knowledge distillation are inserted into the frozen ViT backbone of the student model. We demonstrate that our KDP method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration · Advanced Control Systems Optimization · Fault Detection and Control Systems

MethodsLinear Layer · Softmax · Attention Is All You Need · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Focus · Knowledge Distillation