Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, Pontus, Stenetorp

TL;DR
This paper investigates how the order of few-shot prompts significantly impacts large language models' performance, introduces a method to identify effective prompt permutations without extra data, and achieves notable improvements across multiple tasks.
Contribution
It reveals the sensitivity of few-shot prompts to order, analyzes the phenomenon across models, and proposes an entropy-based method to find good prompts without additional annotated data.
Findings
Order significantly affects model performance.
The phenomenon is consistent across model sizes.
The proposed method improves performance by 13% relative.
Abstract
When primed with only a handful of training samples, very large, pretrained language models such as GPT-3 have shown competitive results when compared to fully-supervised, fine-tuned, large, pretrained language models. We demonstrate that the order in which the samples are provided can make the difference between near state-of-the-art and random guess performance: essentially some permutations are "fantastic" and some not. We analyse this phenomenon in detail, establishing that: it is present across model sizes (even for the largest current models), it is not related to a specific subset of samples, and that a given good permutation for one model is not transferable to another. While one could use a development set to determine which permutations are performant, this would deviate from the true few-shot setting as it requires additional annotated data. Instead, we use the generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Layer Normalization · Residual Connection · Weight Decay · Byte Pair Encoding · Dropout
