A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan, Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr

TL;DR
This paper provides a comprehensive survey of prompt engineering techniques applied to vision-language models, covering methods, applications, and challenges across various model types to guide future research.
Contribution
It systematically reviews recent advances in prompt engineering for vision-language models, highlighting differences from language and vision models and discussing future research directions.
Findings
Summarizes prompting methods for multimodal-to-text, image-text matching, and text-to-image models.
Identifies key challenges and ethical issues in prompt engineering.
Outlines future research opportunities in the field.
Abstract
Prompt engineering is a technique that involves augmenting a large pre-trained model with task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be created manually as natural language instructions or generated automatically as either natural language instructions or vector representations. Prompt engineering enables the ability to perform predictions based solely on prompts without updating model parameters, and the easier application of large pre-trained models in real-world tasks. In past years, Prompt engineering has been well-studied in natural language processing. Recently, it has also been intensively studied in vision-language modeling. However, there is currently a lack of a systematic overview of prompt engineering on pre-trained vision-language models. This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
