Multi-Layer Attention is the Amplifier of Demonstration Effectiveness
Dingzirui Wang, Xuangliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng

TL;DR
This paper investigates why some demonstrations in in-context learning are ineffective, reveals that multi-layer models amplify differences in demonstration effectiveness, and introduces GradS, a gradient-based demonstration selection method that improves performance.
Contribution
The paper provides a theoretical analysis of demonstration effectiveness, shows how multi-layer models amplify effectiveness disparities, and proposes GradS for better demonstration selection based on gradient flow.
Findings
Effectiveness disparity among demonstrations increases with model layers.
GradS improves demonstration selection by leveraging gradient flow.
Experimental validation shows GradS outperforms baselines by 6.8% on average.
Abstract
Numerous studies have investigated the underlying mechanisms of in-context learning (ICL) effectiveness to inspire the design of related methods. However, existing work predominantly assumes the effectiveness of the demonstrations provided within ICL, while many research indicates that not all demonstrations are effective, failing to yielding any performance improvement during ICL. Therefore, in this paper, we investigate the reasons behind demonstration ineffectiveness. Our analysis is based on gradient flow and linear self-attention models. By setting the gradient flow to zero, we deduce that a demonstration becomes ineffective if its information has either been learned by the model or is irrelevant to the user query. Furthermore, we demonstrate that in multi-layer models, the disparity in effectiveness among demonstrations is amplified with layer increasing, causing the model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Information Retrieval and Search Behavior
