Regularized Mask Tuning: Uncovering Hidden Knowledge in Pre-trained Vision-Language Models
Kecheng Zheng, Wei Wu, Ruili Feng, Kai Zhu, Jiawei Liu, Deli Zhao,, Zheng-Jun Zha, Wei Chen, Yujun Shen

TL;DR
Regularized mask tuning uncovers hidden knowledge in pre-trained vision-language models by selectively masking parameters, leading to significant performance improvements across multiple datasets without extensive retraining.
Contribution
The paper introduces a novel regularized mask tuning method that identifies and leverages important parameters for downstream tasks in pre-trained VLMs, enhancing transfer performance.
Findings
Achieved 18.73% improvement over zero-shot CLIP.
Masked an average of only 2.56% parameters.
Method is compatible and boosts performance with existing tuning techniques.
Abstract
Prompt tuning and adapter tuning have shown great potential in transferring pre-trained vision-language models (VLMs) to various downstream tasks. In this work, we design a new type of tuning method, termed as regularized mask tuning, which masks the network parameters through a learnable selection. Inspired by neural pathways, we argue that the knowledge required by a downstream task already exists in the pre-trained weights but just gets concealed in the upstream pre-training stage. To bring the useful knowledge back into light, we first identify a set of parameters that are important to a given downstream task, then attach a binary mask to each parameter, and finally optimize these masks on the downstream data with the parameters frozen. When updating the mask, we introduce a novel gradient dropout strategy to regularize the parameter selection, in order to prevent the model from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsDropout · Contrastive Language-Image Pre-training · Adapter
