From Points to Clouds: Learning Robust Semantic Distributions for Multi-modal Prompts
Weiran Li, Yeqiang Liu, Yijie Wei, Mina Han, Xin Liu, Zhenbo Li

TL;DR
This paper introduces P2C, a novel framework that models semantic distributions as clouds rather than points, enhancing robustness and generalization in multimodal prompt learning for visual language models.
Contribution
P2C reframes prompt learning as a denoising task inspired by diffusion models, enabling the learning of semantic clouds for improved robustness and generalization.
Findings
Outperforms strong baselines on 11 datasets
Achieves 79.7% harmonic mean on base-to-novel benchmark
Demonstrates improved robustness and generalization
Abstract
Multimodal Prompt Learning (MPL) has emerged as a pivotal technique for adapting large-scale Visual Language Models (VLMs). However, current MPL methods are fundamentally limited by their optimization of a single, static point representation. This paradigm is inherently brittle, leads to overfitting on base classes, and generalizes poorly to novel or ambiguous categories. We challenge this point paradigm, proposing that robust generalization requires learning a semantic cloud (i.e., a distribution over the embedding space). To achieve this, we introduce Points-to-Clouds (P2C), a novel framework inspired by diffusion models that reframes prompt learning as a dynamic denoising task. At the core of P2C is a dual denoising mechanism: a Dynamic Prompt Denoising (DPD) mechanism perturbs text prompts with sophisticated, annealed noise to learn a smoother semantic landscape, while an auxiliary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
