SOTA: Self-adaptive Optimal Transport for Zero-Shot Classification with Multiple Foundation Models

Zhanxuan Hu; Qiyu Xu; Yu Duan; Yonghang Tai; Huafeng Li

arXiv:2506.13723·cs.CV·March 13, 2026

SOTA: Self-adaptive Optimal Transport for Zero-Shot Classification with Multiple Foundation Models

Zhanxuan Hu, Qiyu Xu, Yu Duan, Yonghang Tai, Huafeng Li

PDF

Open Access

TL;DR

This paper introduces SOTA, a training-free ensemble method that adaptively combines multiple foundation models for zero-shot classification, improving performance across diverse domains by leveraging their complementary strengths.

Contribution

SOTA is a novel, prior-free, self-adaptive ensemble framework that integrates multiple foundation models using optimal transport without additional training.

Findings

01

Significant performance improvements over individual models.

02

Effective across natural images, medical, and remote sensing domains.

03

Automatically balances contributions of different models.

Abstract

Foundation models have attracted widespread attention across domains due to their powerful zero-shot classification capabilities. This work is motivated by two key observations: (1) \textit{Vision-Language Models} (VLMs), such as CLIP, often over-rely on class-level textual priors and struggle to capture fine-grained visual cues, whereas \textit{Vision-only Foundation Models} (VFMs), such as DINO, provide rich and discriminative visual features but lack semantic alignment; (2) the performance of different VLMs varies considerably across datasets owing to differences in pre-training. To address these challenges, we propose \textbf{SOTA} (\textit{Self-adaptive Optimal TrAnsport}), a \textit{training-free} ensemble framework that integrates the outputs of multiple foundation models~(VFMs or VLMs) by learning a self-adaptive transport plan. Notably, \textbf{SOTA} is prior-free and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Geophysical Methods and Applications · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training