Transferable Model-agnostic Vision-Language Model Adaptation for Efficient Weak-to-Strong Generalization

Jihwan Park; Taehoon Song; Sanghyeok Lee; Miso Choi; Hyunwoo J. Kim

arXiv:2508.08604·cs.CV·January 21, 2026

Transferable Model-agnostic Vision-Language Model Adaptation for Efficient Weak-to-Strong Generalization

Jihwan Park, Taehoon Song, Sanghyeok Lee, Miso Choi, Hyunwoo J. Kim

PDF

1 Video

TL;DR

This paper introduces TransMiter, a lightweight, model-agnostic adapter that enables efficient transfer of adaptation knowledge across vision-language models without backpropagation, improving generalization in visual recognition tasks.

Contribution

The paper proposes TransMiter, a novel, lightweight, unsupervised adapter that transfers knowledge across models without backpropagation, reducing computational costs and enhancing model adaptation.

Findings

01

TransMiter effectively transfers knowledge across different VLMs.

02

Supplementing with few labeled data improves performance beyond fine-tuned models.

03

TransMiter maintains generalization across models of various sizes and architectures.

Abstract

Vision-Language Models (VLMs) have been widely used in various visual recognition tasks due to their remarkable generalization capabilities. As these models grow in size and complexity, fine-tuning becomes costly, emphasizing the need to reuse adaptation knowledge from 'weaker' models to efficiently enhance 'stronger' ones. However, existing adaptation transfer methods exhibit limited transferability across models due to their model-specific design and high computational demands. To tackle this, we propose Transferable Model-agnostic adapter (TransMiter), a light-weight adapter that improves vision-language models 'without backpropagation'. TransMiter captures the knowledge gap between pre-trained and fine-tuned VLMs, in an 'unsupervised' manner. Once trained, this knowledge can be seamlessly transferred across different models without the need for backpropagation. Moreover, TransMiter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Transferable Model-agnostic Vision-Language Model Adaptation for Efficient Weak-to-Strong Generalization· underline