Scaling In-Context Demonstrations with Structured Attention
Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang

TL;DR
This paper introduces SAICL, a structured attention architecture that enhances in-context learning in large language models by improving efficiency, scalability, and permutation invariance, outperforming traditional full-attention methods.
Contribution
SAICL replaces full attention with structured attention, enabling scalable, permutation-invariant in-context learning with improved speed and performance.
Findings
SAICL achieves up to 3.4x inference speed-up.
SAICL outperforms FiD baseline in in-context learning tasks.
SAICL scales effectively to hundreds of demonstrations.
Abstract
The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
