In-Context Learning with Many Demonstration Examples

Mukai Li; Shansan Gong; Jiangtao Feng; Yiheng Xu; Jun Zhang; Zhiyong; Wu; Lingpeng Kong

arXiv:2302.04931·cs.CL·February 13, 2023·5 cites

In-Context Learning with Many Demonstration Examples

Mukai Li, Shansan Gong, Jiangtao Feng, Yiheng Xu, Jun Zhang, Zhiyong, Wu, Lingpeng Kong

PDF

Open Access 1 Repo

TL;DR

This paper introduces EVALM, an efficient long-range transformer model that enables in-context learning with up to 256k tokens, significantly advancing the scalability and effectiveness of large language models in handling extensive demonstration examples.

Contribution

EVALM is a novel long-range language model with an efficient transformer mechanism, allowing scalable in-context learning with much larger context sizes than previous models.

Findings

01

EVALM achieves 4.1% higher accuracy on diverse tasks.

02

In-context learning benefits from more demonstrations, especially with 8k instructions.

03

Extending instruction length to 16k further improves scaling capabilities.

Abstract

Large pre-training language models (PLMs) have shown promising in-context learning abilities. However, due to the backbone transformer architecture, existing PLMs are bottlenecked by the memory and computational cost when scaling up to a large context size, leaving instruction tuning and in-context learning of many demonstration examples, as well as long-range language modeling under-explored. In this study, we propose a long-range language model EVALM based on an efficient transformer mechanism. EVALM is trained with 8k tokens per batch line and can test up to 256k-lengthed contexts with extrapolation, 128 times to the limit of existing PLMs (e.g. GPT3). Based on EVALM, we scale up the size of examples efficiently in both instruction tuning and in-context learning to explore the boundary of the benefits from more annotated data. Experimental results on a diverse set of tasks show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shark-nlp/evalm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsTest