Understanding prompt engineering may not require rethinking generalization
Victor Akinwande, Yiding Jiang, Dylan Sam, J. Zico Kolter

TL;DR
This paper explains why prompt engineering in vision-language models generalizes well, using PAC-Bayes bounds to show that handcrafted and automatically generated prompts often have tight bounds close to actual test errors.
Contribution
It introduces a PAC-Bayes-based theoretical framework to justify the generalization ability of prompt engineering in zero-shot learning.
Findings
PAC-Bayes bounds are tight for prompts, often within a few percentage points of true test error.
Empirical validation shows handcrafted and greedy search prompts align with theoretical bounds.
Model selection based on bounds correlates with actual test performance.
Abstract
Zero-shot learning in prompted vision-language models, the practice of crafting prompts to build classifiers without an explicit training process, has achieved impressive performance in many settings. This success presents a seemingly surprising observation: these methods suffer relatively little from overfitting, i.e., when a prompt is manually engineered to achieve low error on a given training set (thus rendering the method no longer actually zero-shot), the approach still performs well on held-out test data. In this paper, we show that we can explain such performance well via recourse to classical PAC-Bayes bounds. Specifically, we show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature: for instance, the generalization bound of an ImageNet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning
