From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When
Kevin Christian Wibisono, Yixin Wang

TL;DR
This paper investigates how large language models trained on unstructured text can perform in-context learning, revealing that many capabilities arise from simple co-occurrence patterns and highlighting the importance of data structure.
Contribution
It demonstrates that in-context learning can emerge from unstructured data through co-occurrence, clarifies when positional info is necessary, and identifies limitations in logic reasoning and analogy tasks.
Findings
ICL capabilities can arise from co-occurrence in unstructured data
Positional information is crucial for logic reasoning tasks
ICL fails when relevant pairs are fixed in training positions
Abstract
Large language models (LLMs) like transformers demonstrate impressive in-context learning (ICL) capabilities, allowing them to make predictions for new tasks based on prompt exemplars without parameter updates. While existing ICL theories often assume structured training data resembling ICL tasks (e.g., x-y pairs for linear regression), LLMs are typically trained unsupervised on unstructured text, such as web content, which lacks clear parallels to tasks like word analogy. To address this gap, we examine what enables ICL in models trained on unstructured data, focusing on critical sequence model requirements and training data structure. We find that many ICL capabilities can emerge simply from co-occurrence of semantically related word pairs in unstructured data; word analogy completion, for example, can provably arise purely through co-occurrence modeling, using classical language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Algorithms
MethodsFocus
