Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

Kailai Yang; Xiao Liu; Lei Ji; Hao Li; Xiao Liang; Zhiwei Liu; Yeyun Gong; Peng Cheng; Mao Yang

arXiv:2507.15640·cs.LG·April 14, 2026

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

Kailai Yang, Xiao Liu, Lei Ji, Hao Li, Xiao Liang, Zhiwei Liu, Yeyun Gong, Peng Cheng, Mao Yang

PDF

TL;DR

This paper introduces Data Mixing Agent, a reinforcement learning-based framework that learns to re-weight domain data for continual pre-training, improving model performance and generalization across tasks.

Contribution

It proposes the first end-to-end, model-based approach to learn domain re-weighting heuristics, surpassing manual methods and demonstrating broad applicability.

Findings

01

Outperforms strong baselines in continual pre-training for math reasoning.

02

Generalizes well across unseen source fields, target models, and domain spaces.

03

Efficiently achieves better performance with less source data.

Abstract

Continual pre-training on small-scale task-specific data is an effective method for improving large language models in new target fields, yet it risks catastrophic forgetting of their original capabilities. A common solution is to re-weight training data mixtures from source and target fields on a domain space to achieve balanced performance. Previous domain reweighting strategies rely on manual designation with certain heuristics based on human intuition or empirical results. In this work, we prove that more general heuristics can be parameterized by proposing Data Mixing Agent, the first model-based, end-to-end framework that learns to re-weight domains. The agent learns generalizable heuristics through reinforcement learning on large quantities of data mixing trajectories with corresponding feedback from an evaluation environment. Experiments in continual pre-training on math…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.