AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

Nicholas E. Corrado; Julian Katz-Samuels; Adithya Devraj; Hyokun Yun; Chao Zhang; Yi Xu; Yi Pan; Bing Yin; Trishul Chilimbi

arXiv:2506.00569·cs.LG·June 3, 2025

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs

Nicholas E. Corrado, Julian Katz-Samuels, Adithya Devraj, Hyokun Yun, Chao Zhang, Yi Xu, Yi Pan, Bing Yin, Trishul Chilimbi

PDF

Open Access 1 Video

TL;DR

AutoMixAlign (AMA) is a theoretically-grounded adaptive data mixing algorithm that improves multi-task preference optimization in large language models by balancing task performance during training.

Contribution

AMA introduces a novel minimax optimization framework with two algorithms, AMA-R and AMA-S, for adaptive data mixing in multi-task LLM alignment, backed by convergence guarantees.

Findings

01

AMA outperforms standard total loss optimization in multi-task alignment.

02

AMA surpasses model merging methods in multi-task preference performance.

03

Both AMA-R and AMA-S achieve $O(1/\sqrt{T})$ convergence rate in convex settings.

Abstract

When aligning large language models (LLMs), their performance on various tasks (such as being helpful, harmless, and honest) depends heavily on the composition of their training data. However, selecting a data mixture that achieves strong performance across all tasks is challenging. Existing approaches rely on large ablation studies, heuristics, or human intuition, but these can be prohibitively expensive and suboptimal. We study this problem in the setting of preference optimization via DPO and introduce AutoMixAlign (AMA), a theoretically-grounded algorithm that adaptively mixes datasets during training to balance performance across tasks. AMA first trains \textit{specialist models} for each task to determine losses that correspond to strong task performance. Then, it trains a generalist model using a novel minimax optimization that prioritizes tasks for which generalist model losses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs· underline

Taxonomy

TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Simulation Techniques and Applications