Actor-Critic based Online Data Mixing For Language Model Pre-Training
Jing Ma, Chenhao Dang, Mingjie Liao

TL;DR
This paper introduces an actor-critic based online data mixing method for language model pretraining that dynamically adapts data domain weights, leading to faster convergence and improved performance on benchmarks.
Contribution
It develops an AC-ODM method that captures domain interactions and adapts sampling strategies using auxiliary actor-critic networks, enhancing pretraining efficiency.
Findings
Achieves 71% faster convergence compared to previous methods.
Improves zero-shot MMLU accuracy by 27.5%.
Outperforms on HumanEval pass@1 by 2.23x.
Abstract
The coverage and composition of pretraining data significantly impacts the generalization ability of Large Language Models (LLMs). To reduce the carbon footprint and financial costs of training, some data mixing methods, which applied the optimized domain weights of a small proxy model to train a larger one, were proposed. However, these methods did not evolute with the training dynamics. The existing online data mixing (ODM) method addressed this limitation by applying the multi-armed bandit algorithm as data sampling strategy. Yet, it did not consider the intra-domain interactions. In this paper, we develop an actor-critic based online data mixing (AC-ODM) method, which captures the varying domain weights by auxiliary actor-critic networks and consider the intra-domain interactions with the reward function. While constructing the dataset to pretrain a large target LLM, we directly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Speech and dialogue systems
