GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Simin Fan; Maria Ios Glarou; Martin Jaggi

arXiv:2505.20380·cs.LG·May 28, 2025

GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining

Simin Fan, Maria Ios Glarou, Martin Jaggi

PDF

Open Access

TL;DR

GRAPE is a novel adaptive pretraining framework that dynamically optimizes data mixtures across multiple sources and target tasks, improving large language models' robustness and performance on diverse benchmarks.

Contribution

Introduces a multi-source, multi-target domain reweighting method using a minimax optimization to enhance model robustness across multiple tasks and languages.

Findings

01

Outperforms baseline methods on 6 reasoning benchmarks.

02

Effectively identifies optimal training mixtures for multilingual tasks.

03

Improves language modeling for 8 low-resource languages.

Abstract

The performance of large language models (LLMs) across diverse downstream applications is fundamentally governed by the quality and composition of their pretraining corpora. Existing domain reweighting algorithms primarily optimize data mixtures for a single target task, thereby resulting in models that overfit to specialized objectives while exhibiting substantial performance degradation on other benchmarks. This paper introduces Group Robust Multi-target Adaptive PrEtraining (GRAPE), a novel multi-source-multi-target domain reweighting framework designed to calibrate pretraining data mixtures for robust performance across multiple target tasks simultaneously. GRAPE dynamically adjusts sampling weights across source domains (domain weights) while concurrently modulating task weights that quantify the relative importance of each individual target task. This adaptive process prioritizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Measurement and Detection Methods