Metadata Conditioning Accelerates Language Model Pre-training
Tianyu Gao, Alexander Wettig, Luxi He, Yihe Dong, Sadhika Malladi, Danqi Chen

TL;DR
The paper introduces Metadata Conditioning then Cooldown (MeCo), a simple method that accelerates language model pre-training by using metadata cues, enabling faster training and more controllable outputs without extra computation.
Contribution
MeCo is a novel, straightforward approach that incorporates metadata during pre-training and allows for model steering, improving efficiency and controllability of language models.
Findings
MeCo reduces pre-training data requirements by 33%.
Models trained with MeCo perform comparably on downstream tasks.
MeCo enables steering outputs via metadata cues.
Abstract
The vast diversity of styles, domains, and quality levels present in language model pre-training corpora is essential in developing general model capabilities, but efficiently learning and deploying the correct behaviors exemplified in each of these heterogeneous data sources is challenging. To address this, we propose a new method, termed Metadata Conditioning then Cooldown (MeCo), to incorporate additional learning cues during pre-training. MeCo first provides metadata (e.g., URLs like wwwwikipediaorg) alongside the text during training and later uses a cooldown phase with only the standard text, thereby enabling the model to function normally even without metadata. MeCo significantly accelerates pre-training across different model scales (600M to 8B parameters) and training sources (C4, RefinedWeb, and DCLM). For instance, a 1.6B language model trained with MeCo matches the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
