SOTA: Self-adaptive Optimal Transport for Zero-Shot Classification with Multiple Foundation Models
Zhanxuan Hu, Qiyu Xu, Yu Duan, Yonghang Tai, Huafeng Li

TL;DR
This paper introduces SOTA, a training-free ensemble method that adaptively combines multiple foundation models for zero-shot classification, improving performance across diverse domains by leveraging their complementary strengths.
Contribution
SOTA is a novel, prior-free, self-adaptive ensemble framework that integrates multiple foundation models using optimal transport without additional training.
Findings
Significant performance improvements over individual models.
Effective across natural images, medical, and remote sensing domains.
Automatically balances contributions of different models.
Abstract
Foundation models have attracted widespread attention across domains due to their powerful zero-shot classification capabilities. This work is motivated by two key observations: (1) \textit{Vision-Language Models} (VLMs), such as CLIP, often over-rely on class-level textual priors and struggle to capture fine-grained visual cues, whereas \textit{Vision-only Foundation Models} (VFMs), such as DINO, provide rich and discriminative visual features but lack semantic alignment; (2) the performance of different VLMs varies considerably across datasets owing to differences in pre-training. To address these challenges, we propose \textbf{SOTA} (\textit{Self-adaptive Optimal TrAnsport}), a \textit{training-free} ensemble framework that integrates the outputs of multiple foundation models~(VFMs or VLMs) by learning a self-adaptive transport plan. Notably, \textbf{SOTA} is prior-free and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Geophysical Methods and Applications · COVID-19 diagnosis using AI
MethodsContrastive Language-Image Pre-training
