Understanding the Dynamics of Demonstration Conflict in In-Context Learning
Difan Jiao, Di Wang, Lijie Hu

TL;DR
This paper investigates how large language models process conflicting demonstrations in in-context learning, revealing a two-phase internal structure and identifying attention heads responsible for reasoning failures, leading to improved performance through targeted ablation.
Contribution
It uncovers the internal two-phase processing structure of models handling conflicting evidence and identifies specific attention heads responsible for vulnerabilities, proposing targeted mitigation.
Findings
Models encode both correct and incorrect rules in intermediate layers.
Prediction confidence develops only in late layers.
Ablation of key attention heads improves performance by over 10%.
Abstract
In-context learning enables large language models to perform novel tasks through few-shot demonstrations. However, demonstrations per se can naturally contain noise and conflicting examples, making this capability vulnerable. To understand how models process such conflicts, we study demonstration-dependent tasks requiring models to infer underlying patterns, a process we characterize as rule inference. We find that models suffer substantial performance degradation from a single demonstration with corrupted rule. This systematic misleading behavior motivates our investigation of how models process conflicting evidence internally. Using linear probes and logit lens analysis, we discover that under corruption models encode both correct and incorrect rules in intermediate layers but develop prediction confidence only in late layers, revealing a two-phase computational structure. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Topic Modeling
