Prompt Diffusion Robustifies Any-Modality Prompt Learning
Yingjun Du, Gaowen Liu, Yuzhang Shang, Yuguang Yao, Ramana Kompella,, Cees G. M. Snoek

TL;DR
This paper introduces prompt diffusion, a diffusion-based method that refines prompts for foundation models, enhancing robustness and generalization across modalities and datasets in zero-shot and few-shot learning scenarios.
Contribution
It proposes a novel prompt diffusion framework that generates customized prompts via a diffusion process, improving prompt learning robustness without requiring label access during inference.
Findings
Improves base-to-new generalization in prompt learning.
Enhances cross-dataset and domain generalization.
Achieves robust performance across 15 diverse datasets.
Abstract
Foundation models enable prompt-based classifiers for zero-shot and few-shot learning. Nonetheless, the conventional method of employing fixed prompts suffers from distributional shifts that negatively impact generalizability to unseen samples. This paper introduces prompt diffusion, which uses a diffusion model to gradually refine the prompts to obtain a customized prompt for each sample. Specifically, we first optimize a collection of prompts to obtain over-fitted prompts per sample. Then, we propose a prompt diffusion model within the prompt space, enabling the training of a generative transition process from a random prompt to its overfitted prompt. As we cannot access the label of a test image during inference, our model gradually generates customized prompts solely from random prompts using our trained, prompt diffusion. Our prompt diffusion is generic, flexible, and…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The method in this paper generates customized prompts for each sample by gradually optimizing the prompts through diffusion, which enhances the accuracy of prediction and generalization across downstream tasks. 2. The diffusion prompting method in this paper is a plug-and-play module that can be seamlessly integrated into existing textual, visual, or multimodal prompt learning methods. 3. The method in this paper improves the prompt learning process by efficiently extracting unique domain det
1. The authors' method requires stepwise optimization of the prompts and may require several iterations to obtain optimal results, in addition, the introduction of a diffusion model increases the complexity of the system, and therefore whether the training time is likely to be relatively long. 2. Whether the authors' approach is a two-stage process, where prompt learning is performed first, followed by diffusion of the prompts, and the final model performance relies on the goodness of the prev
1. Experiments have shown that the proposed method outperforms baseline methods. 2. The overall idea is intuitive and straightforward, addressing the limitations of fixed prompts by leveraging diffusion models to generate over-fitted prompts per sample, which enhances model robustness against distribution shifts.
1. Considering that the proposed method is conducted on per sample. during training, does it introduce a significantly larger computational load compared to conventional prompt learning methods? Can a comparative analysis be provided to address this concern? 2. While the proposed method is plug-and-play and the pipeline figure demonstrations are based on CoCoOp, it would be beneficial to include sections addressing visual prompt tuning and multi-modal prompt tuning. Additionally, the method em
1. Introduces an innovative, modality-agnostic diffusion process that significantly enhances robustness in prompt-based learning. 2. Demonstrates consistent empirical improvements across various prompt learning tasks, supporting the efficacy of diffusion models. 3. Efficient design reduces inference time, making it suitable for diverse real-world applications.
1. The paper does not fully articulate the specific limitations of the SOTA prompt mehthods in adapting to distributional shifts in data, which creates ambiguity around the critical nature of these issues within broader prompt-learning applications. To make this critique more actionable, the authors could quantify the performance degradation caused by these shifts in existing methods to better contextualize the importance of their contribution. Specific examples are not enough to illustrate the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Neural Networks and Applications
MethodsDiffusion
