Why Larger Language Models Do In-context Learning Differently?

Zhenmei Shi; Junyi Wei; Zhuoyan Xu; Yingyu Liang

arXiv:2405.19592·cs.LG·May 31, 2024·2 cites

Why Larger Language Models Do In-context Learning Differently?

Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

PDF

Open Access

TL;DR

This paper investigates why larger language models exhibit different in-context learning behaviors, revealing that model size influences focus on features and sensitivity to noise, through theoretical analysis and preliminary experiments.

Contribution

It provides a theoretical framework explaining how model size affects in-context learning behavior and robustness in transformers.

Findings

01

Smaller models focus on important features and are more noise-robust.

02

Larger models cover more features but are more sensitive to noise.

03

Theoretical analysis aligns with preliminary experimental results.

Abstract

Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend to be more sensitive to noise in the test context. This work studies this observation theoretically aiming to improve the understanding of LLM and ICL. We analyze two stylized settings: (1) linear regression with one-layer single-head linear transformers and (2) parity classification with two-layer multiple attention heads transformers (non-linear data and non-linear model). In both settings, we give closed-form optimal solutions and find that smaller models emphasize important hidden features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsBalanced Selection · Linear Regression