Minimal Interaction Separated Tuning: A New Paradigm for Visual Adaptation
Ningyuan Tang, Minghao Fu, Jianxin Wu

TL;DR
This paper introduces MIST, a novel separated tuning method for large vision models that enables efficient low-resource device adaptation by leveraging intermediate features and a lightweight attention-based adaptor.
Contribution
MIST presents a new separated tuning paradigm that reduces information transfer and computational costs while maintaining high adaptation performance.
Findings
MIST achieves competitive results on visual adaptation benchmarks.
It significantly reduces information transfer overhead.
It demonstrates high efficiency in parameters, computation, and memory.
Abstract
The rapid scaling of large vision pretrained models makes fine-tuning tasks more and more difficult on devices with low computational resources. We explore a new visual adaptation paradigm called separated tuning, which treats large pretrained models as standalone feature extractors that run on powerful cloud servers. The fine-tuning carries out on devices which possess only low computational resources (slow CPU, no GPU, small memory, etc.) Existing methods that are potentially suitable for our separated tuning paradigm are discussed. But, three major drawbacks hinder their application in separated tuning: low adaptation capability, large adapter network, and in particular, high information transfer overhead. To address these issues, we propose Minimal Interaction Separated Tuning, or MIST, which reveals that the sum of intermediate features from pretrained models not only has minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColor Science and Applications · Image and Video Quality Assessment · Advanced Vision and Imaging
MethodsAdapter
