The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design
Yoav Levine, Noam Wies, Daniel Jannai, Dan Navon, Yedid Hoshen, Amnon, Shashua

TL;DR
This paper reveals a bias in neural language model pretraining caused by contiguous text chunking, which affects dependency modeling, and proposes a new example design called "kNN-Pretraining" to enhance model capabilities.
Contribution
It formalizes the bias introduced by contiguous chunking in pretraining and introduces "kNN-Pretraining" to improve language understanding and question answering.
Findings
Pretraining bias favors dependencies within the same example.
Including semantically related sentences improves representations.
Proposed scheme enhances question answering abilities.
Abstract
Pretraining Neural Language Models (NLMs) over a large corpus involves chunking the text into training examples, which are contiguous text segments of sizes processable by the neural architecture. We highlight a bias introduced by this common practice: we prove that the pretrained NLM can model much stronger dependencies between text segments that appeared in the same training example, than it can between text segments that appeared in different training examples. This intuitive result has a twofold role. First, it formalizes the motivation behind a broad line of recent successful NLM training heuristics, proposed for the pretraining and fine-tuning stages, which do not necessarily appear related at first glance. Second, our result clearly indicates further improvements to be made in NLM pretraining for the benefit of Natural Language Understanding tasks. As an example, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
