A Mechanism for Sample-Efficient In-Context Learning for Sparse   Retrieval Tasks

Jacob Abernethy; Alekh Agarwal; Teodor V. Marinov; Manfred K. Warmuth

arXiv:2305.17040·cs.LG·May 29, 2023·1 cites

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth

PDF

Open Access

TL;DR

This paper proposes a mechanism explaining how large language models perform in-context learning for sparse retrieval tasks, emphasizing segmentation, hypothesis inference, and application, with theoretical guarantees and empirical validation.

Contribution

It introduces a novel mechanism demonstrating how transformers can perform in-context learning for sparse retrieval, with formal sample complexity guarantees and empirical insights.

Findings

01

Segmentation of prompts is challenging in practice.

02

Attention maps correspond to the hypothesized inference process.

03

Sample complexity guarantees are established for the proposed mechanism.

Abstract

We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to explain how a pre-trained transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks. We posit a mechanism whereby a transformer can achieve the following: (a) receive an i.i.d. sequence of examples which have been converted into a prompt using potentially-ambiguous delimiters, (b) correctly segment the prompt into examples and labels, (c) infer from the data a \textit{sparse linear regressor} hypothesis, and finally (d) apply this hypothesis on the given test example and return a predicted label. We establish that this entire procedure is implementable using the transformer mechanism,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis

MethodsTest