Scaling In-Context Demonstrations with Structured Attention

Tianle Cai; Kaixuan Huang; Jason D. Lee; Mengdi Wang

arXiv:2307.02690·cs.CL·July 7, 2023

Scaling In-Context Demonstrations with Structured Attention

Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang

PDF

Open Access

TL;DR

This paper introduces SAICL, a structured attention architecture that enhances in-context learning in large language models by improving efficiency, scalability, and permutation invariance, outperforming traditional full-attention methods.

Contribution

SAICL replaces full attention with structured attention, enabling scalable, permutation-invariant in-context learning with improved speed and performance.

Findings

01

SAICL achieves up to 3.4x inference speed-up.

02

SAICL outperforms FiD baseline in in-context learning tasks.

03

SAICL scales effectively to hundreds of demonstrations.

Abstract

The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications