Sparse Tuning Enhances Plasticity in PTM-based Continual Learning
Huan Zhang, Shenghua Fan, Shuyu Dong, Yujin Zheng, Dingwen Wang, Fan Lyu

TL;DR
This paper introduces MIST, a sparse tuning method that selectively updates a tiny fraction of pre-trained model parameters based on mutual information, significantly improving continual learning performance while preserving pre-trained knowledge.
Contribution
The paper proposes a novel mutual information-guided sparse tuning approach that updates less than 5% of parameters, enhancing adaptability and generalization in continual learning.
Findings
MIST improves performance across various benchmarks.
Fewer than 0.5% of parameters are updated per step.
Integrating MIST with baselines yields significant gains.
Abstract
Continual Learning with Pre-trained Models holds great promise for efficient adaptation across sequential tasks. However, most existing approaches freeze PTMs and rely on auxiliary modules like prompts or adapters, limiting model plasticity and leading to suboptimal generalization when facing significant distribution shifts. While full fine-tuning can improve adaptability, it risks disrupting crucial pre-trained knowledge. In this paper, we propose Mutual Information-guided Sparse Tuning (MIST), a plug-and-play method that selectively updates a small subset of PTM parameters, less than 5%, based on sensitivity to mutual information objectives. MIST enables effective task-specific adaptation while preserving generalization. To further reduce interference, we introduce strong sparsity regularization by randomly dropping gradients during tuning, resulting in fewer than 0.5% of parameters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing
