Order Matters: Rethinking Prompt Construction in In-Context Learning
Warren Li, Yiqian Wang, Zihan Wang, Jingbo Shang

TL;DR
This paper demonstrates that the order of examples in in-context learning prompts significantly impacts model performance, comparable to the effect of example selection, and proposes methods to identify strong orderings using development sets.
Contribution
It challenges the assumption that example selection outweighs ordering effects in ICL and systematically compares their impacts across multiple models and tasks.
Findings
Order effects are comparable to selection effects in performance variance.
Strong orderings can be identified using only development sets.
Prompt design should consider both example selection and ordering.
Abstract
In-context learning (ICL) enables large language models to perform new tasks by conditioning on a sequence of examples. Most prior work reasonably and intuitively assumes that which examples are chosen has a far greater effect on performance than how those examples are ordered, leading to a focus on example selection. We revisit this assumption and conduct a systematic comparison between the effect of selection and ordering. Through controlled experiments on both classification and generation tasks, using multiple open-source model families (0.5B to 27B parameters) and GPT-5, we find that the variance in performance due to different example orderings is comparable to that from using entirely different example sets. Furthermore, we show that strong orderings can be identified using only a development set, achieving performance close to an oracle that selects the best ordering based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
