Local-Global Prompt Learning via Sparse Optimal Transport
Deniz Kizaro\u{g}lu, \"Ulku Tuncer K\"u\c{c}\"uktas, Emre \c{C}akmakyurdu, Alptekin Temizel

TL;DR
The paper introduces SOT-GLP, a novel prompt learning method that partitions visual regions among class prompts using optimal transport, improving few-shot classification and out-of-distribution detection in vision-language models.
Contribution
SOT-GLP is the first approach to explicitly partition salient visual regions among class prompts with balanced optimal transport, enhancing both accuracy and robustness.
Findings
Achieves 85.1% accuracy on 11 benchmarks with 16-shot ViT-B/16.
State-of-the-art 94.2% OOD detection AUC surpassing fully adapted models.
Outperforms prior prompt-learning methods in few-shot classification.
Abstract
Few-shot adaptation of vision-language models (VLMs) like CLIP typically relies on learning textual prompts matched to global image embeddings. Recent works extend this paradigm by incorporating local image-text alignment to capture fine-grained visual cues, yet these approaches often select local regions independently for each prompt, leading to redundant local feature usage and prompt overlap. We propose SOT-GLP, which introduces a shared sparse patch support and balanced optimal transport allocation to explicitly partition salient visual regions among class-specific local prompts while preserving global alignment. Our method learns shared global prompts and class-specific local prompts. The global branch maintains standard image-text matching for robust category-level alignment. The local branch constructs a class-conditioned sparse patch set using V-V attention and aligns it to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
