Deep Reprogramming Distillation for Medical Foundation Models
Siyuan Du, Yuhang Zhou, Haolin Li, Jiangchao Yao, Haishuai Wang, Hui Lin, Ya Zhang, Yanfeng Wang

TL;DR
This paper introduces Deep Reprogramming Distillation (DRD), a novel framework that enhances the adaptation of large medical foundation models to specific tasks by overcoming domain gaps and improving knowledge transfer efficiency.
Contribution
The paper proposes DRD with a reprogramming module and CKA distillation, enabling effective adaptation and robust knowledge transfer from foundation models to lightweight medical models.
Findings
DRD outperforms previous PEFT and KD methods on 18 medical tasks.
DRD effectively handles 2D/3D classification and segmentation scenarios.
Empirical results demonstrate superior performance across diverse medical applications.
Abstract
Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
