Model Specific Task Similarity for Vision Language Model Selection via Layer Conductance
Wei Yang, Hong Xie, Tao Tan, Xin Li, Defu Lian, Enhong Chen

TL;DR
This paper introduces a novel method for selecting the best vision-language model for a specific task by analyzing internal model dynamics and task similarity, outperforming existing approaches in diverse datasets.
Contribution
It proposes a new framework using layer conductance and directional divergence to predict model transferability without extensive evaluation.
Findings
Achieves 14.7% improvement in NDCG@5 over state-of-the-art methods.
Effectively predicts model rankings across 48 VLMs and 21 datasets.
Outperforms existing selection baselines in diverse vision-language tasks.
Abstract
While open sourced Vision-Language Models (VLMs) have proliferated, selecting the optimal pretrained model for a specific downstream task remains challenging. Exhaustive evaluation is often infeasible due to computational constraints and data limitations in few shot scenarios. Existing selection methods fail to fully address this: they either rely on data-intensive proxies or use symmetric textual descriptors that neglect the inherently directional and model-specific nature of transferability. To address this problem, we propose a framework that grounds model selection in the internal functional dynamics of the visual encoder. Our approach represents each task via layer wise conductance and derives a target-conditioned block importance distribution through entropy regularized alignment. Building on this, we introduce Directional Conductance Divergence (DCD), an asymmetric metric that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
