Focused Large Language Models are Stable Many-Shot Learners
Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang,, Chuyi Tan, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

TL;DR
This paper introduces FocusICL, a training-free method that improves large language models' stability and performance in many-shot learning by filtering distractions and focusing attention on key content, supported by theoretical and experimental validation.
Contribution
The paper proposes FocusICL, a novel attention filtering technique for large language models that enhances many-shot learning without additional training.
Findings
FocusICL improves performance by 5.2% on average over vanilla ICL.
FocusICL scales effectively with many-shot demonstrations.
Theoretical and experimental analysis confirms attention dispersion as a key issue.
Abstract
In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations. With the increase in available context length of LLMs, recent experiments have shown that the performance of ICL does not necessarily scale well in many-shot (demonstration) settings. We theoretically and experimentally confirm that the reason lies in more demonstrations dispersing the model attention from the query, hindering its understanding of key content. Inspired by how humans learn from examples, we propose a training-free method FocusICL, which conducts triviality filtering to avoid attention being diverted by unimportant contents at token-level and operates hierarchical attention to further ensure sufficient attention towards current query at demonstration-level. We also design an efficient hyperparameter searching strategy for FocusICL based on model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
