Analyzing limits for in-context learning
Omar Naim, Jerome Bolte, Nicholas Asher

TL;DR
This paper critically examines the capabilities of transformer models in in-context learning, providing empirical evidence and mathematical analysis that highlight their limitations in achieving general predictive accuracy.
Contribution
It challenges prior claims by demonstrating that transformers cannot fully implement standard learning algorithms due to architectural constraints.
Findings
Empirical evidence contradicts the idea that transformers learn standard algorithms.
Mathematical analysis shows inherent architectural limitations.
Transformers cannot attain universal predictive accuracy.
Abstract
Our paper challenges claims from prior research that transformer-based models, when learning in context, implicitly implement standard learning algorithms. We present empirical evidence inconsistent with this view and provide a mathematical analysis demonstrating that transformers cannot achieve general predictive accuracy due to inherent architectural limitations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Attention Is All You Need · Layer Normalization
