Velocitune: A Velocity-based Dynamic Domain Reweighting Method for Continual Pre-training
Zheheng Luo, Xin Zhang, Xiao Liu, Haoling Li, Yeyun Gong, Chen Qi, Peng Cheng

TL;DR
Velocitune introduces a dynamic domain reweighting framework for continual pre-training, improving language model performance by adaptively balancing domain learning velocities based on a scaling law.
Contribution
It proposes Velocitune, a novel method that dynamically adjusts domain data proportions during pre-training using learning velocity assessments and a scaling law, addressing domain-adaptive continual pre-training challenges.
Findings
Improves performance on math and code reasoning tasks.
Enhances command-line generation benchmarks.
Effective due to target loss prediction and data ordering.
Abstract
It is well-known that a diverse corpus is critical for training large language models, which are typically constructed from a mixture of various domains. In general, previous efforts resort to sampling training data from different domains with static proportions, as well as adjusting data proportions during training. However, few methods have addressed the complexities of domain-adaptive continual pre-training. To fill this gap, we propose Velocitune, a novel framework dynamically assesses learning velocity and adjusts data proportions accordingly, favoring slower-learning domains while shunning faster-learning ones, which is guided by a scaling law to indicate the desired learning goal for each domain with less associated cost. To evaluate the effectiveness of Velocitune, we conduct experiments in a reasoning-focused dataset with CodeLlama, as well as in a corpus specialised for system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Flow Measurement and Analysis
