DEM: Distribution Edited Model for Training with Mixed Data   Distributions

Dhananjay Ram; Aditya Rawal; Momchil Hardalov; Nikolaos Pappas; Sheng; Zha

arXiv:2406.15570·cs.CL·November 6, 2024

DEM: Distribution Edited Model for Training with Mixed Data Distributions

Dhananjay Ram, Aditya Rawal, Momchil Hardalov, Nikolaos Pappas, Sheng, Zha

PDF

Open Access 1 Video

TL;DR

The paper introduces DEM, a cost-effective method that combines individually trained models on different data sources to improve multi-task training, outperforming traditional data mixing approaches.

Contribution

DEM offers a simple, scalable, and more efficient alternative to data mixing by combining models with element-wise operations, reducing costs and enhancing performance.

Findings

01

DEM is 11x cheaper than standard data mixing.

02

DEM improves performance on multiple benchmarks, up to 16.1%.

03

DEM does not require full re-training when data sources change.

Abstract

Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive training runs. In this paper, we propose a simple and efficient alternative for better optimization of the data sources by combining models individually trained on each data source with the base model using basic element-wise vector operations. The resulting model, namely Distribution Edited Model (DEM), is 11x cheaper than standard data mixing and outperforms strong baselines on a variety of benchmarks, yielding upto 6.2% improvement on MMLU, 11.5% on BBH, 16.1% on DROP, 6% on MathQA, and 9.3% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DEM: Distribution Edited Model for Training with Mixed Data Distributions· underline

Taxonomy

TopicsMineral Processing and Grinding · Soil Geostatistics and Mapping · Gaussian Processes and Bayesian Inference

MethodsBalanced Selection