Continual Distillation Learning: Knowledge Distillation in Prompt-based Continual Learning
Qifan Zhang, Yunhui Guo, Yu Xiang

TL;DR
This paper proposes a novel knowledge distillation method called KDP for prompt-based continual learning, improving efficiency and performance by distilling from large to small vision transformers.
Contribution
Introduces KDP, a new prompt-based knowledge distillation technique tailored for continual learning with vision transformers.
Findings
KDP outperforms existing KD methods in CDL scenarios.
Global prompts enhance knowledge transfer effectiveness.
Distillation improves inference efficiency in prompt-based CL models.
Abstract
We introduce the problem of continual distillation learning (CDL) in order to use knowledge distillation (KD) to improve prompt-based continual learning (CL) models. The CDL problem is valuable to study since the use of a larger vision transformer (ViT) leads to better performance in prompt-based continual learning. The distillation of knowledge from a large ViT to a small ViT improves the inference efficiency for prompt-based CL models. We empirically found that existing KD methods such as logit distillation and feature distillation cannot effectively improve the student model in the CDL setup. To address this issue, we introduce a novel method named Knowledge Distillation based on Prompts (KDP), in which globally accessible prompts specifically designed for knowledge distillation are inserted into the frozen ViT backbone of the student model. We demonstrate that our KDP method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProcess Optimization and Integration · Advanced Control Systems Optimization · Fault Detection and Control Systems
MethodsLinear Layer · Softmax · Attention Is All You Need · Dense Connections · Multi-Head Attention · Layer Normalization · Residual Connection · Vision Transformer · Focus · Knowledge Distillation
