Data Mixing Optimization for Supervised Fine-Tuning of Large Language Models
Yuan Li, Zhengzhong Liu, and Eric Xing

TL;DR
This paper presents a novel optimization-based method for data mixing in supervised fine-tuning of large language models, improving performance and reducing reliance on grid search for weight selection.
Contribution
We introduce a new optimization framework for data mixing in LLM fine-tuning, with a model-based approach that minimizes validation loss and enhances domain-specific performance.
Findings
Our method achieves performance comparable to grid search-based weights.
Reweighting datasets improves validation and downstream results.
The approach generalizes to domain-specific data selection.
Abstract
Optimizing data mixtures for supervised fine-tuning (SFT) of large language models (LLMs) is critical for developing general-purpose models, yet this area remains underexplored. In this paper, we frame data mixing as an optimization problem and introduce a novel method designed to minimize validation loss. Our approach parametrizes the loss by modeling effective data transferred and leveraging scaling laws for fine-tuning. By experimenting with various small-scale data mixtures, we fit these parameters and derive the optimal weights. We provide both mathematical proofs and empirical results demonstrating that our algorithm achieves excellent overall and individual performance across all domains. Through controlled experiments, we show that models trained with our optimized weights perform on par with those using optimal weights determined via grid search, with per-domain loss only 0.66%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
