Local-Global Prompt Learning via Sparse Optimal Transport

Deniz Kizaro\u{g}lu; \"Ulku Tuncer K\"u\c{c}\"uktas; Emre \c{C}akmakyurdu; Alptekin Temizel

arXiv:2603.08347·cs.CV·March 10, 2026

Local-Global Prompt Learning via Sparse Optimal Transport

Deniz Kizaro\u{g}lu, \"Ulku Tuncer K\"u\c{c}\"uktas, Emre \c{C}akmakyurdu, Alptekin Temizel

PDF

Open Access

TL;DR

The paper introduces SOT-GLP, a novel prompt learning method that partitions visual regions among class prompts using optimal transport, improving few-shot classification and out-of-distribution detection in vision-language models.

Contribution

SOT-GLP is the first approach to explicitly partition salient visual regions among class prompts with balanced optimal transport, enhancing both accuracy and robustness.

Findings

01

Achieves 85.1% accuracy on 11 benchmarks with 16-shot ViT-B/16.

02

State-of-the-art 94.2% OOD detection AUC surpassing fully adapted models.

03

Outperforms prior prompt-learning methods in few-shot classification.

Abstract

Few-shot adaptation of vision-language models (VLMs) like CLIP typically relies on learning textual prompts matched to global image embeddings. Recent works extend this paradigm by incorporating local image-text alignment to capture fine-grained visual cues, yet these approaches often select local regions independently for each prompt, leading to redundant local feature usage and prompt overlap. We propose SOT-GLP, which introduces a shared sparse patch support and balanced optimal transport allocation to explicitly partition salient visual regions among class-specific local prompts while preserving global alignment. Our method learns shared global prompts and class-specific local prompts. The global branch maintains standard image-text matching for robust category-level alignment. The local branch constructs a class-conditioned sparse patch set using V-V attention and aligns it to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis