Understanding In-Context Learning in Transformers and LLMs by Learning   to Learn Discrete Functions

Satwik Bhattamishra; Arkil Patel; Phil Blunsom; Varun Kanade

arXiv:2310.03016·cs.LG·October 5, 2023·1 cites

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions

Satwik Bhattamishra, Arkil Patel, Phil Blunsom, Varun Kanade

PDF

Open Access 1 Video

TL;DR

This paper investigates how Transformers and large language models learn algorithms for Boolean functions, revealing their capabilities, limitations, and adaptability in in-context learning scenarios through various experiments.

Contribution

It demonstrates the extent of Transformers' ability to learn different algorithms, compares attention-based and attention-free models, and evaluates LLMs' performance on unseen tasks.

Findings

01

Transformers perform well on simple tasks but struggle with complex ones.

02

Attention-free models show similar capabilities to Transformers.

03

LLMs like GPT-4 can effectively predict unseen data, rivaling nearest-neighbor methods.

Abstract

In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued functions. However, the limitations of Transformers in implementing learning algorithms, and their ability to learn other forms of algorithms are not well understood. Additionally, the degree to which these capabilities are confined to attention-based models is unclear. Furthermore, it remains to be seen whether the insights derived from these stylized settings can be extrapolated to pretrained Large Language Models (LLMs). In this work, we take a step towards answering these questions by demonstrating the following: (a) On a test-bed with a variety of Boolean function classes, we find that Transformers can nearly match the optimal learning algorithm for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection · Layer Normalization