In-Context Fixation: When Demonstrated Labels Override Semantics in Few-Shot Classification
Ming Liu

TL;DR
This paper reveals that in-context learning models tend to fixate on demonstrated labels, overriding semantics and collapsing accuracy, due to their constrained vocabulary retrieval behavior.
Contribution
It uncovers the mechanism behind label fixation in in-context learning, showing it is driven by output vocabulary constraints rather than semantic understanding.
Findings
Homogeneous labels reduce accuracy to <=12% across six models and four tasks.
Demonstrations with varied nonsense tokens cause the model to fixate on the demonstrated set.
Mechanistic analysis localizes fixation to a specific layer and recovers most of the effect.
Abstract
While random demonstration labels barely hurt in-context learning (Min et al., 2022), we show that homogeneous labels--even semantically valid ones--collapse accuracy to <=12% across six models (Pythia, Llama, Qwen; 0.8B--8B) and four tasks. The trigger is label-slot content: the model treats tokens occupying the label position as an exhaustive answer vocabulary, with homogeneity as the maximally collapsed case. A novel set-level fixation finding confirms this: when demonstrations carry varied nonsense tokens from {foo,bar,vex,nit,orb}, the model places 42--67% of probability on the demonstrated set while P(dog) remains below 0.2%. This is inconsistent with latent-concept Bayesian accounts (Xie et al., 2022) and reveals that ICL output is constrained vocabulary retrieval--the model binds its output to the demonstrated token inventory regardless of semantic plausibility. The effect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
