Vision-Language Model Selection and Reuse for Downstream Adaptation
Hao-Zhe Tan, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo

TL;DR
This paper introduces Model Label Learning (MLL), a novel, efficient paradigm for selecting and reusing pre-trained Vision-Language Models (VLMs) tailored to specific downstream tasks, addressing the challenge of choosing optimal models.
Contribution
The paper proposes MLL, a new method with model labeling, selection, and reuse modules, enabling scalable and task-specific VLM selection and ensemble application.
Findings
MLL effectively selects suitable VLMs for various tasks.
The method demonstrates high computational efficiency and scalability.
Experimental results show improved performance in downstream tasks.
Abstract
Pre-trained Vision-Language Models (VLMs) are becoming increasingly popular across various visual tasks, and several open-sourced VLM variants have been released. However, selecting the best-performing pre-trained VLM for a specific downstream task is challenging since no single VLM can achieve promising performance on all downstream tasks, and evaluating all available VLMs is impossible due to time and data limitations. To address this problem, this paper proposes a novel paradigm to select and reuse VLM for downstream tasks, called Model Label Learning (MLL). The proposal contains three key modules: \emph{model labeling}, which assigns labels to each VLM to describe their specialty and utility; \emph{model selection}, which matches the requirements of the target task with model labels; and \emph{model reuse}, which applies selected VLMs to the target task in an ensemble manner. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
