Towards Auto-Regressive Next-Token Prediction: In-Context Learning   Emerges from Generalization

Zixuan Gong; Xiaolin Hu; Huayi Tang; Yong Liu

arXiv:2502.17024·cs.CL·February 25, 2025

Towards Auto-Regressive Next-Token Prediction: In-Context Learning Emerges from Generalization

Zixuan Gong, Xiaolin Hu, Huayi Tang, Yong Liu

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical framework for understanding how in-context learning emerges from generalization in large language models, focusing on auto-regressive next-token prediction and addressing limitations of previous analyses.

Contribution

It introduces a formal pre-training and ICL framework emphasizing sequence and topic dependencies, with new PAC-Bayesian generalization bounds for LLMs.

Findings

01

ICL emerges from sequence and topic generalization.

02

Theoretical bounds depend on data, topics, and optimization.

03

Experimental validation on synthetic and real datasets.

Abstract

Large language models (LLMs) have demonstrated remarkable in-context learning (ICL) abilities. However, existing theoretical analysis of ICL primarily exhibits two limitations: (a) Limited i.i.d. Setting. Most studies focus on supervised function learning tasks where prompts are constructed with i.i.d. input-label pairs. This i.i.d. assumption diverges significantly from real language learning scenarios where prompt tokens are interdependent. (b) Lack of Emergence Explanation. Most literature answers what ICL does from an implicit optimization perspective but falls short in elucidating how ICL emerges and the impact of pre-training phase on ICL. In our paper, to extend (a), we adopt a more practical paradigm, auto-regressive next-token prediction (AR-NTP), which closely aligns with the actual training of language models. Specifically, within AR-NTP, we emphasize prompt token-dependency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Auto-Regressive Next-Token Prediction: In-context Learning Emerges from Generalization· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning in Healthcare · Machine Learning and Data Classification

MethodsADaptive gradient method with the OPTimal convergence rate · Focus