If You Can't Use Them, Recycle Them: Optimizing Merging at Scale   Mitigates Performance Tradeoffs

Muhammad Khalifa; Yi-Chern Tan; Arash Ahmadian; Tom Hosking; Honglak; Lee; Lu Wang; Ahmet \"Ust\"un; Tom Sherborne; Matthias Gall\'e

arXiv:2412.04144·cs.CL·February 5, 2025

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Muhammad Khalifa, Yi-Chern Tan, Arash Ahmadian, Tom Hosking, Honglak, Lee, Lu Wang, Ahmet \"Ust\"un, Tom Sherborne, Matthias Gall\'e

PDF

Open Access

TL;DR

This paper investigates merging large models trained on different tasks to create a Pareto-optimal model that outperforms individual checkpoints and merge baselines, effectively recycling suboptimal models.

Contribution

It introduces an optimization algorithm for merging large models that recycles suboptimal checkpoints to achieve Pareto efficiency across tasks.

Findings

01

Merged models outperform individual checkpoints.

02

Including most checkpoints improves merge quality.

03

Recycling suboptimal models enhances multi-task performance.

Abstract

Model merging has shown great promise at combining expert models, but the benefit of merging is unclear when merging "generalist" models trained on many tasks. We explore merging in the context of large (~100B) models, by recycling checkpoints that exhibit tradeoffs among different tasks. Such checkpoints are often created in the process of developing a frontier model, and the suboptimal ones are usually discarded. Given a pool of model checkpoints obtained from different training runs (e.g., different stages, objectives, hyperparameters, and data mixtures), which naturally show tradeoffs across different language capabilities (e.g., instruction following vs. code generation), we investigate whether merging can recycle such suboptimal models into a Pareto-optimal one. Our optimization algorithm tunes the weight of each checkpoint in a linear combination, resulting in such an optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation and Knowledge Management