Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation

Yutong Zhang; Jiaxin Chen; Honglin Chen; Kaiqi Zheng; Shengcai Liao; Hanwen Zhong; Weixin Li; Yunhong Wang

arXiv:2604.09088·cs.CV·April 13, 2026

Memory-Efficient Transfer Learning with Fading Side Networks via Masked Dual Path Distillation

Yutong Zhang, Jiaxin Chen, Honglin Chen, Kaiqi Zheng, Shengcai Liao, Hanwen Zhong, Weixin Li, Yunhong Wang

PDF

1 Repo

TL;DR

This paper introduces Masked Dual Path Distillation, a method that accelerates inference in transfer learning by discarding side networks post-training, maintaining efficiency and improving accuracy across vision and language tasks.

Contribution

It proposes a novel framework that enhances transfer learning efficiency by mutually distilling backbone and side networks and discarding the side network during inference.

Findings

01

Accelerates inference by at least 25.2%

02

Maintains parameter and memory efficiency during fine-tuning

03

Improves accuracy over state-of-the-art methods

Abstract

Memory-efficient transfer learning (METL) approaches have recently achieved promising performance in adapting pre-trained models to downstream tasks. They avoid applying gradient backpropagation in large backbones, thus significantly reducing the number of trainable parameters and high memory consumption during fine-tuning. However, since they typically employ a lightweight and learnable side network, these methods inevitably introduce additional memory and time overhead during inference, which contradicts the ultimate goal of efficient transfer learning. To address the above issue, we propose a novel approach dubbed Masked Dual Path Distillation (MDPD) to accelerate inference while retaining parameter and memory efficiency in fine-tuning with fading side networks. Specifically, MDPD develops a framework that enhances the performance by mutually distilling the frozen backbones and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zhang-VKk/MDPD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.