TL;DR
This paper uncovers a positional bias in in-context learning where the placement of demonstrations within prompts significantly impacts model accuracy and stability, with demos at the start generally yielding better results.
Contribution
It systematically studies the positional bias in ICL, introduces metrics to quantify it, and demonstrates its significant effect across multiple tasks and models.
Findings
Demos at prompt start improve accuracy by up to 6 points.
Placing demos at the end can flip over 30% of predictions.
Smaller models are more sensitive to demo placement.
Abstract
In-context learning (ICL) is a critical emerging capability of large language models (LLMs), enabling few-shot learning during inference by including a few demonstrations (demos) in the prompt. However, it has been found that ICL's performance can be sensitive to the choices of demos and their order. This paper investigates an unexplored new positional bias of ICL for the first time: we observe that the predictions and accuracy can drift drastically when the positions of demos, the system prompt, and the user message in LLM input are varied. We refer to this bias as DEMOS' POSITION IN PROMPT (DPP) bias. We design a systematic evaluation pipeline to study this type of positional bias across classification, question answering, summarization, and reasoning tasks. We introduce two metrics, ACCURACY-CHANGE and PREDICTION-CHANGE, to quantify net gains and output volatility induced by changes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
